• DocumentCode
    387530
  • Title

    Active learning with simplified SVMs for spam categorization

  • Author

    Kun-Lun Li ; Li, Kun-lun ; Huang, Hou-Kuan ; Tian, Sheng-Feng

  • Author_Institution
    Sch. of Comput. & Inf. Technol., Northern Jiaotong Univ., Beijing, China
  • Volume
    3
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    1198
  • Abstract
    We propose a method for spam categorization based on support vector machines (SVMs) using active learning strategy. We study the use of support vector machines in classifying e-mail as spam or nonspam. But the standard algorithms for training support vector machines generally produce solutions with a greater number of support vectors than strictly necessary. An algorithm is applied in the paper that allows the unnecessary support vectors to be recognized and eliminated. We analyze the particular properties of our special task and identify why SVMs especially the simplified SVMs are appropriate for dealing with spam. Instead of using a randomly selected training set, the learner has access to a pool of unlabeled instances and can request the labels for some number of them. We introduce a new method for choosing which instances to request next.
  • Keywords
    electronic mail; learning automata; pattern classification; statistical analysis; text analysis; active learning; e-mail; simplified support vector machines; spam categorization; unlabeled instances; Electronic mail; Information technology; Machine learning; Mathematics; Postal services; Risk management; Support vector machine classification; Support vector machines; Unsolicited electronic mail; Virtual colonoscopy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
  • Print_ISBN
    0-7803-7508-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2002.1167390
  • Filename
    1167390