• DocumentCode
    717114
  • Title

    Analysis of the evolution of features in classification problems with concept drift: Application to spam detection

  • Author

    Henke, Marcia ; Souto, Eduardo ; dos Santos, Eulanda M.

  • Author_Institution
    Inst. of Comput., Fed. Univ. of Amazonas - Manaus, Manaus, Brazil
  • fYear
    2015
  • fDate
    11-15 May 2015
  • Firstpage
    874
  • Lastpage
    877
  • Abstract
    Machine Learning solutions for concept drift detection problems try to decide to what extent a particular set of examples still represents the current concept rather than treating all data equally. Monitoring the set of relevant features used to generate the classification model may be an effective strategy for concept drift detection. This paper focuses on analyzing the possibility of detecting drifts through feature evolution monitoring in the spam detection problem. Results of the experiments show that the relevant features of the target domain are significantly different from the relevant features of the source domain. This offers a new possibility for analyzing the relationship between feature evolution and misclassification rate. The experiments were conducted using two databases: a public database composed of samples collected between 2003 and 2004; and a new private database composed of samples collected between 2012 and 2013.
  • Keywords
    learning (artificial intelligence); pattern classification; unsolicited e-mail; classification problems; concept drift detection problems; feature evolution monitoring; machine learning; misclassification rate; spam detection problem; Databases; Error analysis; Feature extraction; Monitoring; Training; Unsolicited electronic mail; concept drift; feature evolution; spam;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Integrated Network Management (IM), 2015 IFIP/IEEE International Symposium on
  • Conference_Location
    Ottawa, ON
  • Type

    conf

  • DOI
    10.1109/INM.2015.7140398
  • Filename
    7140398