Title :
Deobfuscation based on edit distance algorithm for spam filitering
Author_Institution :
Dept. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
Abstract :
Spamming problem has been grown rapidly in the Internet. An adversary obfuscates the spam message by misspelling or inserting useless characters to mislead the decision of the spam filter. Humans still can understand the original meaning of the camouflaged words but the spam filter cannot recognize them. This paper focuses on the well-known obfuscation problem which uses non-alphabetical characters, e.g. Viagra is modified to V!@gr@. The string edit distance algorithm is revised for handling the non-alphabetical characters. The proposed deobfuscation method outperforms than the traditional string edit distance algorithm in the experiment.
Keywords :
formal languages; information filtering; support vector machines; unsolicited e-mail; SVM; backtrack algorithm; deobfuscation method; nonalphabetical character handling; obfuscation problem; spam filtering; spam message; spamming problem; string edit distance algorithm; support vector machine; Abstracts; Barium; Indexes; Support vector machines; Unsolicited electronic mail; Backtrack algorithm; SVM; Spam Filter; String Edit Distance algorithm;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2014 International Conference on
Conference_Location :
Lanzhou
Print_ISBN :
978-1-4799-4216-9
DOI :
10.1109/ICMLC.2014.7009101