DocumentCode :
3228417
Title :
Personalized Spam Filtering with Semi-supervised Classifier Ensemble
Author :
Cheng, Victor ; Li, C.H.
Author_Institution :
Dept. of Comput. Sci., Hong Kong Baptist Univ.
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
195
Lastpage :
201
Abstract :
The proliferation of unsolicited emails, also known as spam, poses significant burden to email users worldwide. Recent researches on spam filtering have shown that high accuracies can be obtained if labeled emails examples are available from the particular user of the spam filter. However, the time consuming process of providing personalized labeled training examples is often inconvenient or impossible due to privacy issues. In this paper, a semi-supervised personalized spam filter based on classifier ensemble is proposed that classifies user´s emails accurately by learning on both generic labeled emails and personalized unlabeled emails. The proposed multi-stage classification process begins learning a SVM model from labeled generic data. Unlabeled user´s emails are then fed to this SVM to generate personalized labeled data for constructing personalized naive Bayes classifiers. Furthermore, some personalized labeled examples are generated by exploiting rare word distributions and then fed into a semi-supervised classifier. The multi-stage results are integrated with SVMs learned from generic labeled emails to produce the final classification results. Experimental results show that the proposed approaches can significantly increases the classification accuracy in spam filtering
Keywords :
information filtering; learning (artificial intelligence); pattern classification; support vector machines; unsolicited e-mail; multistage classification process; semi-supervised Bayes classifier ensemble; semi-supervised personalized spam filtering; support vector machines; unsolicited email; Computer science; Information filtering; Information filters; Internet; Machine learning; Privacy; Support vector machine classification; Support vector machines; Testing; Unsolicited electronic mail;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7
Type :
conf
DOI :
10.1109/WI.2006.132
Filename :
4061366
Link To Document :
بازگشت