Title :
Image spam hunter
Author :
Gao, Yan ; Yang, Ming ; Zhao, Xiaonan ; Pardo, Bryan ; Wu, Ying ; Pappas, Thrasyvoulos N. ; Choudhary, Alok
Author_Institution :
EECS Dept., Northwestern Univ., Evanston, IL
fDate :
March 31 2008-April 4 2008
Abstract :
Spammers are constantly creating sophisticated new weapons in their arms race with anti-spam technology, the latest of which is image-based spam. The newest image-based spam uses simple image processing technologies to vary the content of individual messages, e.g. by changing foreground colors, backgrounds, font types, or even rotating and adding artifacts to the images. Thus, they pose great challenges to conventional spam filters. In this paper, we propose a system using a probabilistic boosting tree to determine whether an incoming image is a spam or not based on global image features, i.e. color and gradient orientation histograms. The system identifies spam without the need for OCR and is robust in the face of the kinds of variation found in current spam images. Evaluation results show the system correctly classifies 90% of spam images while mislabeling only 0.86% of non-spam images as spam.
Keywords :
image classification; image colour analysis; probability; trees (mathematics); unsolicited e-mail; color histogram; gradient orientation histogram; image classification; image features; image processing technology; image-based spam hunter; probabilistic boosting tree; Bayesian methods; Boosting; Electronic mail; Filtering; Histograms; Matched filters; Optical character recognition software; Optical filters; Robustness; Unsolicited electronic mail; Image spam; probabilistic boosting tree;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4517972