Title of article :
Email Spam Detection Using Linear Discriminant Analysis Based on Clustering
Author/Authors :
Imani, Maryam Faculty of Electrical and Computer Engineering - Tarbiat Modares University, Tehran , Montazer, Gholam Ali Faculty of Electrical and Computer Engineering - Tarbiat Modares University, Tehran
Abstract :
The high volume of unwanted spam emails annoys the Internet users; causes spam activities and financial losses. So, spam detection is a serious
task to provide a secure electronic environment. Email spam databases usually have multimodal distributions with high overlap, which cause
difficulties in separating spam emails from normal emails. Moreover, the number of available labeled emails may be limited. A supervised
feature extraction method, which is called cluster space linear discriminant analysis (CSLDA), is proposed in this paper to deal with these
difficulties. CSLDA uses the ability of unlabeled testing samples in addition to labeled training ones for estimation of the within-class and
between-class scatter matrices. Based on the multimodal distribution of email spam databases, CSLDA clusters the unlabeled testing data for
using them in the learning phase of feature extraction. CSLDA uses the testing samples without determination of their labels, and just with
obtaining relationship between training and testing samples through clustering. The use of Fisher criterion increases the class discrimination.
Moreover, the use of clustered unlabeled samples solves the small sample size problem and provides good performance for multimodal data.
The experimental results on spambase dataset indicate the superiority of CSLDA compared to some popular and state-of-the-art feature
extraction and spam detection methods, especially in small sample size situations.
Keywords :
Classification , Clustering , Discriminant analysis , Discriminant analysis , Email spam
Journal title :
The CSI Journal on Computer Science and Engineering (JCSE)