Title :
An Approach to Image Spam Filtering Based on Base64 Encoding and N-Gram Feature Extraction
Author :
Xu, Congfu ; Chen, Yafang ; Chiew, Kevin
Author_Institution :
Inst. of Artificial Intell. Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
Abstract :
As compared with text spam, the image spam is a variant which is invented to escape from traditional text-based spam classification and filtering. Various approaches to image spam filtering have been proposed with respective advantages and drawbacks in terms of time cost and efficiency. In this paper, we propose a new approach based on Base64 encoding of image files and n-gram technique for feature extraction. By transforming normal images into Base64 presentation, we try to extract features of an image with n-gram technique. With these features we train an SVM (support vector machine) which shows effectiveness and efficiency in detecting spam images from legitimate images. With an online shared personal corpus of images as the input, experimental results show that our approach, in comparison with some of the existing methods of feature extraction, can achieve very high performance for image spam classification in terms of some basic measures such as accuracy, precision, and recall. Moreover, our approach shows its practicability by taking less running time for image spam classification in comparison to other methods.
Keywords :
feature extraction; image classification; image coding; support vector machines; unsolicited e-mail; Base64 encoding; SVM; image spam classification; image spam filtering; n-gram feature extraction; support vector machine; text-based spam classification; text-based spam filtering; Feature extraction; Image coding; Image color analysis; Optical character recognition software; Support vector machines; Unsolicited electronic mail;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on
Conference_Location :
Arras
Print_ISBN :
978-1-4244-8817-9
DOI :
10.1109/ICTAI.2010.31