• DocumentCode
    2163485
  • Title

    An empirical study on email classification using supervised machine learning in real environments

  • Author

    Li, Wenjuan ; Meng, Weizhi

  • Author_Institution
    Department of Computer Science, City University of Hong Kong, Hong Kong
  • fYear
    2015
  • fDate
    8-12 June 2015
  • Firstpage
    7438
  • Lastpage
    7443
  • Abstract
    Spam emails are considered as one of the biggest challenges for the Internet. Thus email classification, which aims to correctly classify legitimate and spam emails, becomes an important topic for both industry and academia. To achieve this goal, machine learning techniques, especially supervised machine learning algorithms, have been extensively applied to this field. In literature, several studies reveal that supervised machine learning (SML) suffers from some limitations such as performance fluctuation, hence many works start focusing on designing more complex algorithms. However, we identify that most existing research efforts are based on datasets, while more research should be conducted to investigate the performance of SML in real environments. In this paper, we thus perform an empirical study with three different environments and over 1,000 users regarding this issue. In the study, we find that SML classifiers like decision tree and SVMs are acceptable by users in real email classification. In addition, we discuss promising directions and provide new insights in this area.
  • Keywords
    Companies; Decision trees; Electronic mail; Feature extraction; Machine learning algorithms; Security; Support vector machines; Email Classification; Empirical Study; Spam Detection; Supervised Machine Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications (ICC), 2015 IEEE International Conference on
  • Conference_Location
    London, United Kingdom
  • Type

    conf

  • DOI
    10.1109/ICC.2015.7249515
  • Filename
    7249515