Title :
JURD: Joiner of Un-Readable Documents to reverse tokenization attacks to content-based spam filters
Author :
Santos, Igor ; Laorden, C. ; Sanz, B. ; Bringas, Pablo G.
Author_Institution :
S3Lab., Univ. of Deusto, Bilbao, Spain
Abstract :
Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. More than 85% of received e-mails are spam. Historical approaches to combating these messages, including simple techniques like sender blacklisting or the use of e-mail signatures, are no longer completely reliable. Many current solutions feature machine-learning algorithms trained using statistical representations of the terms that most commonly appear in such e-mails. However, there are attacks that can subvert the filtering capabilities of these methods. Tokenization attacks, in particular, insert characters that create divisions within words, causing incorrect representations of e-mails. In this paper, we introduce a new method that reverses the effects of tokenization attacks. Our method processes e-mails iteratively by considering possible words, starting from the first token and compares the word candidates with a common dictionary to which spam words have been previously added. We provide an empirical study of how tokenization attacks affect the filtering capability of a Bayesian classifier and we show that our method can reverse the effects of tokenization attacks.
Keywords :
Bayes methods; information filters; learning (artificial intelligence); pattern classification; security of data; unsolicited e-mail; Bayesian classifier; JURD; computer security; computer viruses; content-based spam filters; e-mail signatures; machine-learning algorithms; phishing; reverse tokenization attacks; spam words; statistical representations; worms; Bayes methods; Dictionaries; Particle separators; Training; Unsolicited electronic mail; Vectors;
Conference_Titel :
Consumer Communications and Networking Conference (CCNC), 2013 IEEE
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4673-3131-9
DOI :
10.1109/CCNC.2013.6488455