Title :
Using Feature Selection to Speed Up Online SVM Based Spam Filtering
Author :
Shen, Yuewu ; Sun, Guanglu ; Qi, Haoliang ; He, Xiaoning
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Univ. of Sci. & Technol., Harbin, China
Abstract :
In this paper, we propose a feature selection method to speed up online SVM based spam filter. Online SVM gives state-of-the-art classification performance on online spam filtering on large benchmark data sets. However, its computational cost is very expensive for large-scale applications. Feature Selection is a crucial step to online SVM classification. We use a feature selection method based on Bayesian reasoning in this paper, and it based on n-gram feature extraction. The Feature Selection method can reduce feature vector dimension and improve the filter performance a little. It can greatly reduce the computational cost of Online SVMs based spam filter. Experimental results show that the feature selection method outperforms pure online SVM for large-scale spam filtering.
Keywords :
inference mechanisms; information filtering; support vector machines; unsolicited e-mail; Bayesian reasoning; benchmark data sets; computational cost; feature selection method; n-gram feature extraction; online SVM based spam filtering; Bayesian methods; Cognition; Computational efficiency; Electronic mail; Feature extraction; Filtering; Support vector machines; Feature selection; spam filtering; support vector machine (SVM);
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
DOI :
10.1109/IALP.2010.37