Title :
Improved spam e-mail filtering based on committee machines and information theoretic feature extraction
Author :
Zorkadis, V. ; Panayotou, M. ; Karras, D.A.
Author_Institution :
Data Protection Authority, Athens, Greece
fDate :
31 July-4 Aug. 2005
Abstract :
A novel approach for spam e-mail filtering is herein considered based on the committee machines neural network models and on information theoretic feature extraction. An extensive experimental study is organized, the most extensive so far in the literature, based on widely accepted benchmarking e-mail data sets, comparing the proposed methodology with the naive Bayes spam filter as well as with the boosting tree methodology, the linear models based classification (classification via regression) and the nonlinear models based classification using simple neural network models, including multilayer perceptrons. Moreover, several feature extraction approaches based on information theory are evaluated. It is shown that the committee machines mail categorization performance is compared very favorably to the other rival methods performance, including the Bayes spam filter which is the most widely used approach in the e-mail services market. It is, also, found that the proposed information theoretic Boolean features present a remarkably high spam categorization performance compared to their analog counterparts performance.
Keywords :
belief networks; classification; feature extraction; information filtering; information theory; neural nets; unsolicited e-mail; boosting tree methodology; committee machines mail categorization; committee machines neural network models; e-mail data sets; information theoretic feature extraction; linear models based classification; multilayer perceptrons; naive Bayes spam filter; nonlinear models based classification; spam e-mail filtering; Boosting; Classification tree analysis; Electronic mail; Feature extraction; Information filtering; Information filters; Multi-layer neural network; Neural networks; Nonlinear filters; Regression tree analysis;
Conference_Titel :
Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on
Print_ISBN :
0-7803-9048-2
DOI :
10.1109/IJCNN.2005.1555826