• DocumentCode
    2482668
  • Title

    An Online Linear Chinese Spam Emails Filtering System

  • Author

    Qiu, Yongqin ; Xu, Yan ; Wang, Bin

  • Author_Institution
    Beijing Language & Culture Univ., Beijing, China
  • fYear
    2010
  • fDate
    22-23 May 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Spam is a key problem in electronic communication. The increasing volume of spam has become a serious threat not only to the Internet, but also to society. Content-based filtering is one mainstream method of combating this threat in its various forms, but the previous Content-based filtering methods are hard to find a balance between efficiency and effectiveness. In this paper we intend to seek a linear solve for this problem, and two online linear classifiers: the Perceptron and Winnow are explored for this task in three benchmark corpora, which include English corpus PU1, Lingspam and Chinese corpus 2005-Jun, Our experiments conclude that both of these classifiers can filter spam emails effectively as well as efficiently. It is also show that they perform much better than a standard Naïve Bayes method. In fact, to the best of our knowledge, they have a state-of-the-art performance for filtering Chinese spam emails, at least on the above corpora. Furthermore, both of the two classifiers are easily adaptively updated, thus are suitable for real dynamic environment.
  • Keywords
    Bayes methods; Internet; e-mail filters; perceptrons; security of data; unsolicited e-mail; Chinese corpus; Chinese spam emails filtering system; English corpus PU1; Internet threat; Lingspam; Perceptron; Winnow; content-based filtering; corpora; electronic communication; mainstream method; online linear classifiers; real dynamic environment; society threat; standard Naive Bayes method; state-of-the-art performance; Bayesian methods; Computers; Electronic mail; Information filtering; Information filters; Internet; Natural languages; Niobium; Nonlinear filters; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Business and Information System Security (EBISS), 2010 2nd International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-5893-6
  • Electronic_ISBN
    978-1-4244-5895-0
  • Type

    conf

  • DOI
    10.1109/EBISS.2010.5473478
  • Filename
    5473478