Title :
Content-based spam filtering
Author :
Almeida, Tiago A. ; Yamakami, Akebo
Author_Institution :
Sch. of Electr. & Comput. Eng., Univ. of Campinas - UNICAMP, Campinas, Brazil
Abstract :
The growth of email users has resulted in the dramatic increasing of the spam emails. Helpfully, there are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. However, there are several forms of Naive Bayes filters, something the anti-spam literature does not always acknowledge. In this paper, we discuss seven different versions of Naive Bayes classifiers, and compare them with the well-known Linear Support Vector Machine on six non-encoded datasets. Moreover, we propose a new measurement in order to evaluate the quality of anti-spam classifiers. In this way, we investigate the benefits of using Matthews correlation coefficient as the measure of performance.
Keywords :
Bayes methods; content-based retrieval; information filtering; support vector machines; unsolicited e-mail; Bayesian decision theory; Matthews correlation coefficient; content-based spam filtering; email users; linear support vector machine; naive Bayes classifiers; Electronic mail; Gaussian distribution; Manganese; Niobium; Support vector machines; Training;
Conference_Titel :
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6916-1
DOI :
10.1109/IJCNN.2010.5596569