DocumentCode :
1814261
Title :
An evaluation on the efficiency of hybrid feature selection in spam email classification
Author :
Mohamad, Masurah ; Selamat, Ali
Author_Institution :
Software Eng. Res. Group (SERG), Univ. Teknol. Malaysia, Johor Bahru, Malaysia
fYear :
2015
fDate :
21-23 April 2015
Firstpage :
227
Lastpage :
231
Abstract :
In this paper, a spam filtering technique, which implement a combination of two types of feature selection methods in its classification task will be discussed. Spam, which is also known as unwanted message always floods our electronic mail boxes, despite a spam filtering system provided by the email service provider. In addition, the issue of spam is always highlighted by Internet users and attracts many researchers to conduct research works on fighting the spam. A number of frameworks, algorithms, toolkits, systems and applications have been proposed, developed and applied by researchers and developers to protect us from spam. Several steps need to be considered in the classification task such as data pre-processing, feature selection, feature extraction, training and testing. One of the main processes in the classification task is called feature selection, which is used to reduce the dimensionality of word frequency without affecting the performance of the classification task. In conjunction with that, we had taken the initiative to conduct an experiment to test the efficiency of the proposed Hybrid Feature Selection, which is a combination of Term Frequency Inverse Document Frequency (TFIDF) with the rough set theory in spam email classification problem. The result shows that the proposed Hybrid Feature Selection return a good result.
Keywords :
Internet; feature selection; information filtering; pattern classification; security of data; unsolicited e-mail; Internet; TFIDF; data preprocessing; electronic mail boxes; email service provider; feature extraction; hybrid feature selection method; spam email classification problem; spam filtering technique; term frequency inverse document frequency; Accuracy; Filtering; Machine learning algorithms; Set theory; Testing; Unsolicited electronic mail; Spam; TFIDF; algorithm; feature selection; filtering; rough set theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer, Communications, and Control Technology (I4CT), 2015 International Conference on
Conference_Location :
Kuching
Type :
conf
DOI :
10.1109/I4CT.2015.7219571
Filename :
7219571
Link To Document :
بازگشت