• DocumentCode
    1551415
  • Title

    Support vector machines for spam categorization

  • Author

    Drucker, Harris ; Wu, Donghui ; Vapnik, Vladimir N.

  • Author_Institution
    AT&T Labs-Res., Red Bank, NJ, USA
  • Volume
    10
  • Issue
    5
  • fYear
    1999
  • fDate
    9/1/1999 12:00:00 AM
  • Firstpage
    1048
  • Lastpage
    1054
  • Abstract
    We study the use of support vector machines (SVM) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM performed best when using binary features. For both data sets, boosting trees and SVM had acceptable test performance in terms of accuracy and speed. However, SVM had significantly less training time
  • Keywords
    electronic mail; learning (artificial intelligence); neural nets; pattern classification; security of data; Ripper; Rocchio; SVM; binary features; boosting decision trees; e-mail classification; spam categorization; support vector machines; Boosting; Classification algorithms; Classification tree analysis; Electronic mail; Filters; Postal services; Support vector machine classification; Support vector machines; Testing; Unsolicited electronic mail;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/72.788645
  • Filename
    788645