DocumentCode :
2831746
Title :
Adaptive spam filtering using dynamic feature space
Author :
Zhou, Yan ; Mulekar, Madhuri S. ; Nerellapalli, Praveen
Author_Institution :
Sch. of CIS, South Alabama Univ., Mobile, AL
fYear :
2005
fDate :
16-16 Nov. 2005
Lastpage :
309
Abstract :
Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of e-mail collections over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed data distribution. We compare our technique to several existing off-line learning techniques including support vector machine, naive Bayes, k-nearest neighbor, C4.5 decision tree, RBFNetwork, boosted decision tree and stacking, and demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available
Keywords :
Huffman codes; entropy codes; unsolicited e-mail; Huffman coding; adaptive learning; adaptive spam filtering; concept drifting; dynamic feature space; entropy coding; online algorithm; skewed data distribution; unsolicited bulk email; vocabulary change; Adaptive filters; Decision trees; Electronic mail; Entropy coding; Filtering algorithms; Huffman coding; Machine learning; Robustness; Support vector machines; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2005. ICTAI 05. 17th IEEE International Conference on
Conference_Location :
Hong Kong
ISSN :
1082-3409
Print_ISBN :
0-7695-2488-5
Type :
conf
DOI :
10.1109/ICTAI.2005.28
Filename :
1562953
Link To Document :
بازگشت