DocumentCode
3206834
Title
Some empirical results on two spam detection methods
Author
Matsumoto, Ryota ; Zhang, Du ; Lu, Meiliu
Author_Institution
Dept. of Comput. Sci., California State Univ., Sacramento, CA, USA
fYear
2004
fDate
8-10 Nov. 2004
Firstpage
198
Lastpage
203
Abstract
In this paper, we describe the results of an empirical study on two spam detection methods: support vector machines (SVMs) and naive Bayes classifier (NBC). To conduct the study, we implement the NBC and choose to use the SVMlight, an application of SVMs developed by Thorsten Joachims. The NBC and the linear SVMs with different C parameters are trained on a set of 2000 emails with 1000 spams and 1000 nonspams, and are tested on 200 new emails with 100 in each class. A program is written that converts emails into feature vectors using both TF and TF-IDF term weighting methods. The evaluation criteria include accuracy rate, recall, precision, miss rate, and false alarm rate. The results indicate that the both approaches have their pros and cons.
Keywords
Bayes methods; pattern classification; support vector machines; unsolicited e-mail; C parameters; TF term weighting method; TF-IDF term weighting method; accuracy rate; email; false alarm rate; feature vectors; miss rate; naive Bayes classifier; nonspams; precision; recall; spam detection methods; support vector machines; Computer science; Ducts; Electronic mail; Internet; Niobium compounds; Support vector machine classification; Support vector machines; Testing; Text categorization; Unsolicited electronic mail;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on
Print_ISBN
0-7803-8819-4
Type
conf
DOI
10.1109/IRI.2004.1431460
Filename
1431460
Link To Document