Title :
Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters
Author :
Almeida, Tiago A. ; Yamakami, Akebo ; Almeida, Jurandy
Author_Institution :
Sch. of Electr. & Comput. Eng., Univ. of Campinas UNICAMP, Campinas, Brazil
Abstract :
There are different approaches able to automatically detect e-mail spam messages, and the best-known ones are based on Bayesian decision theory. However, the most of these approaches have the same difficulty: the high dimensionality of the feature space. Many term selection methods have been proposed in the literature. Nevertheless, it is still unclear how the performance of naive Bayes anti-spam filters depends on the methods applied for reducing the dimensionality of the feature space. In this paper, we compare the performance of most popular methods used as term selection techniques, such as document frequency, information gain, mutual information, X2 statistic, and odds ratio used for reducing the dimensionality of the term space with four well-known different versions of naive Bayes spam filter.
Keywords :
Bayes methods; belief networks; decision theory; e-mail filters; unsolicited e-mail; Bayesian decision theory; E-mail spam messages detection; X2 statistic technique; antispam filters; dimensionality reduction approach; document frequency technique; information gain technique; mutual information technique; naive Bayes algorithm; odds ratio technique; term selection techniques; Application software; Bayesian methods; Electronic mail; Filtering; Filters; Frequency; Machine learning; Support vector machine classification; Support vector machines; Text categorization; Dimensionality reduction; anti-spam classifier; machine learning;
Conference_Titel :
Machine Learning and Applications, 2009. ICMLA '09. International Conference on
Conference_Location :
Miami Beach, FL
Print_ISBN :
978-0-7695-3926-3
DOI :
10.1109/ICMLA.2009.22