• DocumentCode
    566751
  • Title

    A hierarchical framework for content-based image spam filtering

  • Author

    Li, Xiao Mang ; Kim, Ung Mo

  • Author_Institution
    Sch. of Inf. & Commun. Eng., Sungkyunkwan Univ., Suwon, South Korea
  • Volume
    1
  • fYear
    2012
  • fDate
    26-28 June 2012
  • Firstpage
    149
  • Lastpage
    155
  • Abstract
    Since 1990s, as the problem of spam has become a serious threat to email communication, the prolonged competition between spammers and anti-spam filters has begun and lasted until today. In order to filter spam based on the semantic analysis of email content, many content-based anti-spam approaches have been put forward, such as text-based filtering, image-based filtering, etc. However, the tricks played by spammers are also evolved quickly. Nowadays, it turns out that the capability of any single anti-spam approach is too limited to handle diverse real-world spam effectively. So, how to combine current techniques to construct more effective anti-spam systems has become the major focus of our research. In this paper, we propose a novel hierarchical anti-spam framework, which adopts multiple techniques including text classification, image processing and Optical Character Recognition in different layers to detect spam. We evaluate the proposed approach on several public spam corpora as well as our personal corpus, and verify the effectiveness of the proposed approach in terms of the filtering capacity and filtering performance.
  • Keywords
    classification; content-based retrieval; information filtering; optical character recognition; text analysis; unsolicited e-mail; anti-spam filter; anti-spam system; content-based anti-spam approach; content-based image spam filtering; email communication; email content; filtering capacity; filtering performance; hierarchical framework; image processing; image-based filtering; optical character recognition; personal corpus; public spam corpora; real-world spam; semantic analysis; spam detection; spammer; text classification; text-based filtering; Accuracy; Training; framework; image processing; spam filtering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Digital Content Technology (ICIDT), 2012 8th International Conference on
  • Conference_Location
    Jeju
  • Print_ISBN
    978-1-4673-1288-2
  • Type

    conf

  • Filename
    6269246