• DocumentCode
    2910216
  • Title

    The Comparison of Chinese Spam Filter Based on Generative Model and Discriminative Model

  • Author

    Han, Yong ; Wang, Yingying ; Ding, Huafu ; Qi, Haoliang

  • Author_Institution
    Comput. Sci. & Technol. Dept., Heilongjiang Inst. of Technol., Harbin, China
  • fYear
    2011
  • fDate
    15-17 Nov. 2011
  • Firstpage
    107
  • Lastpage
    110
  • Abstract
    Previous studies have shown that discriminative model is better than generative model for spam filtering, which is tested on the English dataset. But the study on Chinese Spam Filter is rare. So we compared the performance of Bogo: a classical generative model, Logistic Regression (LR) and Relaxed Online SVM (ROSVM): two typical discriminative models on the Chinese dataset. Bogo system adopts a generative model, which is based on Bayesian algorithm. We choose the public Chinese datasets: TREC06c, SEWM 2008, SEWM 2010, SEWM 2011, as the test dataset with immediate feedback. The discriminative model gives the better results than the generative model based on spam filter. ROSVM gives the best performance on Chinese spam filter.
  • Keywords
    belief networks; information filters; natural language processing; regression analysis; support vector machines; unsolicited e-mail; Bayesian algorithm; Bogo system; Chinese spam filter; English dataset; ROSVM; classical generative model; discriminative model; logistic regression; public Chinese dataset; relaxed online SVM; Filtering; Logistics; Machine learning; Support vector machines; Training; Unsolicited electronic mail; Bogo; Chinese spam filter; LR; ROSVM; discriminative model; generative model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2011 International Conference on
  • Conference_Location
    Penang
  • Print_ISBN
    978-1-4577-1733-8
  • Type

    conf

  • DOI
    10.1109/IALP.2011.64
  • Filename
    6121481