• DocumentCode
    3598985
  • Title

    Baeza-Yates and Navarro approximate string matching for spam filtering

  • Author

    Aldwairi, M. ; Flaifel, Y.

  • Author_Institution
    Network Eng. & Security Dept., Jordan Univ. of Sci. & Technol., Irbid, Jordan
  • fYear
    2012
  • Firstpage
    16
  • Lastpage
    20
  • Abstract
    Spam has evolved in terms of contents, methods, delivery networks and volume. Reports indicate that up to 90% of the World Wide Web email traffic is spam [1]. The contents are covering a wider range and are deviating from the conventional pharmaceuticals and adult content into more formal marketing campaigns. This illegal advertising is evolving into an underground market for bot masters who rent or sell spam agents. Progressively, spam campaigns engage new methods to ensure efficient mass delivery and dodge conventional spam detectors. They employ very complicated and vast infrastructure of Botnets and Fast Flux Networks to deliver as many emails as possible. The main concerns for spam detection process are detection and misclassification accuracies, and those remain a challenge because of the evolving techniques employed by spammers. In this paper we propose a bit-parallel string matching spam filtering system based on the improved Baeza-Yates and Navarro approximate string matching algorithm. This method has a low computational cost, is easy to implement, and has the potential to catch misspelled keywords. The proposed approach achieves 97.2% overall accuracy with a simple Naive Bayes classifier.
  • Keywords
    Bayes methods; Internet; information filtering; parallel algorithms; pattern classification; software agents; string matching; telecommunication traffic; unsolicited e-mail; Baeza-Yates-Navarro approximate string matching; Botnets; World Wide Web email traffic; bit-parallel string matching spam filtering system; bot masters; detection accuracy; fast flux networks; illegal advertising; mass delivery; misclassification accuracy; naive Bayes classifier; spam agents; spam campaigns; spam detection process; spam detectors; Baeza-Yates and Navarro; Naïve Bayes classifier; Spam filtering; approximate string matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Computing Technology (INTECH), 2012 Second International Conference on
  • Print_ISBN
    978-1-4673-2678-0
  • Type

    conf

  • DOI
    10.1109/INTECH.2012.6457802
  • Filename
    6457802