Author :
Wu, Hao ; Li, Hong-zuo ; Wang, Gang ; Chen, Hui-ling ; Li, Xiao-kui
Abstract :
E-mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as spam emails. In this paper, a novel spam filtering framework (NSFF) is proposed, which is based on particle swarm optimization, fuzzy logic control, F-score and support vector machine (SVM). We propose a fuzzy adaptive particle swarm optimization (FAPSO) to find an optimal feature subset. In order to identify a subset of features embedded out of a large dataset which is contaminated with high dimensional noise, the proposed method is divided into three stages, namely core feature subset selection, feature subset selection and spam filtering. In the first stage, F-score is used to calculate the importance of each feature, and construct a core feature set, thus obtaining a number of core feature subsets. In the second stage, FAPSO is initialized from the core feature subset and adjusted adaptively via the fuzzy logic control, thereupon obtaining an optimal feature subset. In the final stage, support vector machine is employed as the classifier. According to the optimal feature subset, the input e-mails are classified via SVM. Three publicly available benchmark corpora for spam filtering, the PU1, Ling-Spam and Spam Assassin, are used in our experiments. The numerical results and statistical analysis show that the proposed approach is capable of finding an optimal feature subset from a large noisy data set. In addition, NSFF performs significantly better than the other methods in terms of prediction accuracy with smaller subset of features.
Keywords :
benchmark testing; e-mail filters; fuzzy control; particle swarm optimisation; pattern classification; set theory; support vector machines; unsolicited e-mail; F-score; SVM classifier; benchmark corpora; e-mail; electronic communication; feature subset selection; fuzzy adaptive particle swarm optimization; fuzzy logic control; spam filtering framework; support vector machine; Accuracy; Filtering; Frequency modulation; Optical fibers; Particle swarm optimization; Support vector machines; Unsolicited electronic mail; feature selection; particle swarm optimization; spam filtering; support vector machines;