DocumentCode
1868483
Title
Symbiotic Data Mining for Personalized Spam Filtering
Author
Cortez, Paulo ; Lopes, Clotilde ; Sousa, Pedro ; Rocha, Miguel ; Rio, Miguel
Volume
1
fYear
2009
fDate
15-18 Sept. 2009
Firstpage
149
Lastpage
156
Abstract
Unsolicited e-mail (spam) is a severe problem due to intrusion of privacy, online fraud, viruses and time spent reading unwanted messages. To solve this issue, Collaborative Filtering (CF) and Content-Based Filtering (CBF) solutions have been adopted. We propose a new CBF-CF hybrid approach called Symbiotic Data Mining (SDM), which aims at aggregating distinct local filters in order to improve filtering at a personalized level using collaboration while preserving privacy. We apply SDM to spam e-mail detection and compare it with a local CBF filter (i.e. Naive Bayes). Several experiments were conducted by using a novel corpus based on the well known Enron datasets mixed with recent spam. The results show that the symbiotic strategy is competitive in performance when compared to CBF and also more robust to contamination attacks.
Keywords
Collaboration; Data mining; Data privacy; Electronic mail; Filtering; Filters; Robustness; Symbiosis; Unsolicited electronic mail; Viruses (medical); Collaborative Filtering; Content-based Filtering; Naive Bayes; Spam Classification; Text Mining;
fLanguage
English
Publisher
iet
Conference_Titel
Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
Conference_Location
Milan, Italy
Print_ISBN
978-0-7695-3801-3
Electronic_ISBN
978-1-4244-5331-3
Type
conf
DOI
10.1109/WI-IAT.2009.30
Filename
5286081
Link To Document