• DocumentCode
    48311
  • Title

    An Adaptive Fusion Algorithm for Spam Detection

  • Author

    Congfu Xu ; Baojun Su ; Yunbiao Cheng ; Weike Pan ; Li Chen

  • Author_Institution
    Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
  • Volume
    29
  • Issue
    4
  • fYear
    2014
  • fDate
    July-Aug. 2014
  • Firstpage
    2
  • Lastpage
    8
  • Abstract
    Spam detection has become a critical component in various online systems such as email services, advertising engines, social media sites, and so on. Here, the authors use email services as an example, and present an adaptive fusion algorithm for spam detection (AFSD), which is a general, content-based approach and can be applied to nonemail spam detection tasks with little additional effort. The proposed algorithm uses n-grams of nontokenized text strings to represent an email, introduces a link function to convert the prediction scores of online learners to become more comparable, trains the online learners in a mistake-driven manner via thick thresholding to obtain highly competitive online learners, and designs update rules to adaptively integrate the online learners to capture different aspects of spams. The prediction performance of AFSD is studied on five public competition datasets and on one industry dataset, with the algorithm achieving significantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.
  • Keywords
    content-based retrieval; security of data; unsolicited e-mail; AFSD; adaptive fusion algorithm; content-based approach; nonemail spam detection task; nontokenized text strings; Adaptation models; Algorithm design and analysis; Feature extraction; Online services; Prediction algorithms; Unsolicited electronic mail; adaptive fusion; intelligent systems; spam detection;
  • fLanguage
    English
  • Journal_Title
    Intelligent Systems, IEEE
  • Publisher
    ieee
  • ISSN
    1541-1672
  • Type

    jour

  • DOI
    10.1109/MIS.2013.54
  • Filename
    6563073