مرکز منطقه ای اطلاع رساني علوم و فناوري - Variable length concentration based feature construction method for spam detection

DocumentCode :

3661039

Title :

Variable length concentration based feature construction method for spam detection

Author :

Yang Gao;Guyue Mi;Ying Tan

Author_Institution :

Key Laboratory of Machine Perception (MOE), Peking University, Department of Machine Intelligence, School of Electronics, Engineering and Computer Science, Beijing, 100871, China

fYear :

2015

fDate :

7/1/2015 12:00:00 AM

Firstpage :

Lastpage :

Abstract :

In the field of spam detection, concentration methods have been proposed for feature construction in recent years, which convert emails into fixed length feature vectors. This paper presents a novel method aiming to break through the limit of feature vector´s length. Specifically, the method uses a fixed-length sliding window to divide each email into several sections. The number of sections depends on the length of each email. Consequently, length of feature vectors varies from each other and this paper names them variable length concentrations (VLC). This method can acquire adaptive feature vectors according to different lengths of emails. However, general classifiers are not suitable for this kind of feature vectors, because they are not able to handle fixed-length inputs. As a result, this paper applies recurrent neural networks (RNNs), whose inputs are not restricted by the length, to achieve spam detection. Recall, precision, accuracy and F1 measure are taken to evaluate the method´s performance. Experimental results on the classic corpora, PU1, PU2, PU3 and PUA, show that VLC performs significantly better than previously proposed methods, which provides support to the effectiveness of our method.

Keywords :

"Accuracy","Decision support systems","Training","Computational modeling","Neural networks","Electronic mail","Complexity theory"

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), 2015 International Joint Conference on

Electronic_ISBN :

2161-4407

Type :

conf

DOI :

10.1109/IJCNN.2015.7280346

Filename :

7280346

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3661039