Title :
A multi-resolution-concentration based feature construction approach for spam filtering
Author :
Guyue Mi ; Pengtao Zhang ; Ying Tan
Author_Institution :
Dept. of Machine Intell., Peking Univ., Beijing, China
Abstract :
This paper proposes a multi-resolution-concentration (MRC) based feature construction approach for spam filtering by progressively partitioning an email into local areas on smaller and smaller resolutions. The MRC approach depicts a dynamic process of gradual refinement in locating the pathogens by calculating concentrations of detectors on local areas, and is considered to be able to extract the position-correlated and process-correlated information from emails. Furthermore, A weighted MRC (WMRC) approach is presented by considering the different activity levels of detectors in calculation of concentrations. A generic structure of the MRC model, which mainly contains detector sets construction and multi-resolution concentrations calculation, is designed. The implementations of MRC and WMRC approaches are described in detail. Experiments are conducted on five benchmark corpora using cross-validation to evaluate the proposed MRC model. Comprehensive experimental results suggest that the MRC and WMRC approaches perform far better than the prevalent bag-of-words approach in both performance and efficiency. Compared with the concentration based feature construction approach and local-concentration based feature extraction approach, MRC and WMRC achieve higher accuracy and μ1 measure, which demonstrates the effectiveness of the MRC model. In addition, it is shown that both the MRC and WMRC approaches cooperate well with variety of classification methods, which endows the MRC model with flexible capability in the real world.
Keywords :
e-mail filters; feature extraction; information filtering; unsolicited e-mail; MRC model; email; feature construction approach; gradual refinement; multiresolution-concentration; position-correlated information extraction; process-correlated information extraction; spam filtering; Accuracy; Detectors; Feature extraction; Pathogens; Unsolicited electronic mail; Vectors;
Conference_Titel :
Neural Networks (IJCNN), The 2013 International Joint Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-6128-6
DOI :
10.1109/IJCNN.2013.6706876