DocumentCode
717114
Title
Analysis of the evolution of features in classification problems with concept drift: Application to spam detection
Author
Henke, Marcia ; Souto, Eduardo ; dos Santos, Eulanda M.
Author_Institution
Inst. of Comput., Fed. Univ. of Amazonas - Manaus, Manaus, Brazil
fYear
2015
fDate
11-15 May 2015
Firstpage
874
Lastpage
877
Abstract
Machine Learning solutions for concept drift detection problems try to decide to what extent a particular set of examples still represents the current concept rather than treating all data equally. Monitoring the set of relevant features used to generate the classification model may be an effective strategy for concept drift detection. This paper focuses on analyzing the possibility of detecting drifts through feature evolution monitoring in the spam detection problem. Results of the experiments show that the relevant features of the target domain are significantly different from the relevant features of the source domain. This offers a new possibility for analyzing the relationship between feature evolution and misclassification rate. The experiments were conducted using two databases: a public database composed of samples collected between 2003 and 2004; and a new private database composed of samples collected between 2012 and 2013.
Keywords
learning (artificial intelligence); pattern classification; unsolicited e-mail; classification problems; concept drift detection problems; feature evolution monitoring; machine learning; misclassification rate; spam detection problem; Databases; Error analysis; Feature extraction; Monitoring; Training; Unsolicited electronic mail; concept drift; feature evolution; spam;
fLanguage
English
Publisher
ieee
Conference_Titel
Integrated Network Management (IM), 2015 IFIP/IEEE International Symposium on
Conference_Location
Ottawa, ON
Type
conf
DOI
10.1109/INM.2015.7140398
Filename
7140398
Link To Document