DocumentCode
2490441
Title
Heterogeneous Bayesian ensembles for classifying spam emails
Author
Wang, Wenjia
Author_Institution
Sch. of Comput. Sci., Univ. of East Anglia, Norwich, UK
fYear
2010
fDate
18-23 July 2010
Firstpage
1
Lastpage
8
Abstract
Spam emails have become a major problem in internet communication and can cause potentially serious adverse effects on the recipients if unidentified. Many spam filters have been developed to filter out certain spam emails, but as spammers continuously improve their spamming techniques, the exiting filters may become less effective. This paper presents a heterogeneous ensemble approach that combines several methodologically different filters to work collectively to improve accuracy and reliability in identifying spam emails. A special procedure for building heterogeneous and homogeneous ensembles with Bayesian filter as base learner has been devised and a framework has been designed and implemented. After verifying the framework intensively with 10 other benchmark data sets, it was applied to identify spam emails. The experiments with a spam benchmark corpus indicated that the heterogeneous ensembles achieved more accurate and reliable classifications than the individual and other ensemble filters.
Keywords
Internet; belief networks; information filtering; software reliability; unsolicited e-mail; Bayesian filter; Internet communication; heterogeneous Bayesian ensembles; spam benchmark corpus; spam emails; spamming techniques; Accuracy; Buildings; Data models; Electronic mail; Filtering algorithms; Niobium; Probability;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location
Barcelona
ISSN
1098-7576
Print_ISBN
978-1-4244-6916-1
Type
conf
DOI
10.1109/IJCNN.2010.5596545
Filename
5596545
Link To Document