• DocumentCode
    3206834
  • Title

    Some empirical results on two spam detection methods

  • Author

    Matsumoto, Ryota ; Zhang, Du ; Lu, Meiliu

  • Author_Institution
    Dept. of Comput. Sci., California State Univ., Sacramento, CA, USA
  • fYear
    2004
  • fDate
    8-10 Nov. 2004
  • Firstpage
    198
  • Lastpage
    203
  • Abstract
    In this paper, we describe the results of an empirical study on two spam detection methods: support vector machines (SVMs) and naive Bayes classifier (NBC). To conduct the study, we implement the NBC and choose to use the SVMlight, an application of SVMs developed by Thorsten Joachims. The NBC and the linear SVMs with different C parameters are trained on a set of 2000 emails with 1000 spams and 1000 nonspams, and are tested on 200 new emails with 100 in each class. A program is written that converts emails into feature vectors using both TF and TF-IDF term weighting methods. The evaluation criteria include accuracy rate, recall, precision, miss rate, and false alarm rate. The results indicate that the both approaches have their pros and cons.
  • Keywords
    Bayes methods; pattern classification; support vector machines; unsolicited e-mail; C parameters; TF term weighting method; TF-IDF term weighting method; accuracy rate; email; false alarm rate; feature vectors; miss rate; naive Bayes classifier; nonspams; precision; recall; spam detection methods; support vector machines; Computer science; Ducts; Electronic mail; Internet; Niobium compounds; Support vector machine classification; Support vector machines; Testing; Text categorization; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on
  • Print_ISBN
    0-7803-8819-4
  • Type

    conf

  • DOI
    10.1109/IRI.2004.1431460
  • Filename
    1431460