Title :
Baeza-Yates and Navarro approximate string matching for spam filtering
Author :
Aldwairi, M. ; Flaifel, Y.
Author_Institution :
Network Eng. & Security Dept., Jordan Univ. of Sci. & Technol., Irbid, Jordan
Abstract :
Spam has evolved in terms of contents, methods, delivery networks and volume. Reports indicate that up to 90% of the World Wide Web email traffic is spam [1]. The contents are covering a wider range and are deviating from the conventional pharmaceuticals and adult content into more formal marketing campaigns. This illegal advertising is evolving into an underground market for bot masters who rent or sell spam agents. Progressively, spam campaigns engage new methods to ensure efficient mass delivery and dodge conventional spam detectors. They employ very complicated and vast infrastructure of Botnets and Fast Flux Networks to deliver as many emails as possible. The main concerns for spam detection process are detection and misclassification accuracies, and those remain a challenge because of the evolving techniques employed by spammers. In this paper we propose a bit-parallel string matching spam filtering system based on the improved Baeza-Yates and Navarro approximate string matching algorithm. This method has a low computational cost, is easy to implement, and has the potential to catch misspelled keywords. The proposed approach achieves 97.2% overall accuracy with a simple Naive Bayes classifier.
Keywords :
Bayes methods; Internet; information filtering; parallel algorithms; pattern classification; software agents; string matching; telecommunication traffic; unsolicited e-mail; Baeza-Yates-Navarro approximate string matching; Botnets; World Wide Web email traffic; bit-parallel string matching spam filtering system; bot masters; detection accuracy; fast flux networks; illegal advertising; mass delivery; misclassification accuracy; naive Bayes classifier; spam agents; spam campaigns; spam detection process; spam detectors; Baeza-Yates and Navarro; Naïve Bayes classifier; Spam filtering; approximate string matching;
Conference_Titel :
Innovative Computing Technology (INTECH), 2012 Second International Conference on
Print_ISBN :
978-1-4673-2678-0
DOI :
10.1109/INTECH.2012.6457802