Title :
Variable length concentration based feature construction method for spam detection
Author :
Yang Gao;Guyue Mi;Ying Tan
Author_Institution :
Key Laboratory of Machine Perception (MOE), Peking University, Department of Machine Intelligence, School of Electronics, Engineering and Computer Science, Beijing, 100871, China
fDate :
7/1/2015 12:00:00 AM
Abstract :
In the field of spam detection, concentration methods have been proposed for feature construction in recent years, which convert emails into fixed length feature vectors. This paper presents a novel method aiming to break through the limit of feature vector´s length. Specifically, the method uses a fixed-length sliding window to divide each email into several sections. The number of sections depends on the length of each email. Consequently, length of feature vectors varies from each other and this paper names them variable length concentrations (VLC). This method can acquire adaptive feature vectors according to different lengths of emails. However, general classifiers are not suitable for this kind of feature vectors, because they are not able to handle fixed-length inputs. As a result, this paper applies recurrent neural networks (RNNs), whose inputs are not restricted by the length, to achieve spam detection. Recall, precision, accuracy and F1 measure are taken to evaluate the method´s performance. Experimental results on the classic corpora, PU1, PU2, PU3 and PUA, show that VLC performs significantly better than previously proposed methods, which provides support to the effectiveness of our method.
Keywords :
"Accuracy","Decision support systems","Training","Computational modeling","Neural networks","Electronic mail","Complexity theory"
Conference_Titel :
Neural Networks (IJCNN), 2015 International Joint Conference on
Electronic_ISBN :
2161-4407
DOI :
10.1109/IJCNN.2015.7280346