• DocumentCode
    1992860
  • Title

    Application of Collocation to Spam Filtering

  • Author

    Zhang, Jing ; Yao, Jianmin ; Dong, Shoubin ; Zhang, Ling

  • Author_Institution
    Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
  • Volume
    2
  • fYear
    2008
  • fDate
    21-22 Dec. 2008
  • Firstpage
    715
  • Lastpage
    718
  • Abstract
    Collocation is the frequent bi-grams of semantic meanings and grammatical functions. Adjacent and long distance collocations are extracted as features for a Bayesian classifier in spam filtering. Compared to the common unigram feature, collocation-based classifier shows improvement in all the evaluation metrics. The influence of mail header information is studied for the classifier, which shows a 10% change in both precision and recall.
  • Keywords
    Bayes methods; feature extraction; information filtering; pattern classification; unsolicited e-mail; Bayesian classifier; adjacent collocations; collocation-based classifier; common unigram feature; evaluation metrics; feature extraction; grammatical functions; long distance collocations; mail header information; semantic meanings; spam filtering; Bayesian methods; Computer science; Computer science education; Educational technology; Geoscience and remote sensing; Information filtering; Information filters; Postal services; Probability; Unsolicited electronic mail; Adjacent collocation; Bayesian classifier; Long-distance collocation; Spam filtering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Education Technology and Training, 2008. and 2008 International Workshop on Geoscience and Remote Sensing. ETT and GRS 2008. International Workshop on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3563-0
  • Type

    conf

  • DOI
    10.1109/ETTandGRS.2008.394
  • Filename
    5070462