• DocumentCode
    671536
  • Title

    A multi-resolution-concentration based feature construction approach for spam filtering

  • Author

    Guyue Mi ; Pengtao Zhang ; Ying Tan

  • Author_Institution
    Dept. of Machine Intell., Peking Univ., Beijing, China
  • fYear
    2013
  • fDate
    4-9 Aug. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    This paper proposes a multi-resolution-concentration (MRC) based feature construction approach for spam filtering by progressively partitioning an email into local areas on smaller and smaller resolutions. The MRC approach depicts a dynamic process of gradual refinement in locating the pathogens by calculating concentrations of detectors on local areas, and is considered to be able to extract the position-correlated and process-correlated information from emails. Furthermore, A weighted MRC (WMRC) approach is presented by considering the different activity levels of detectors in calculation of concentrations. A generic structure of the MRC model, which mainly contains detector sets construction and multi-resolution concentrations calculation, is designed. The implementations of MRC and WMRC approaches are described in detail. Experiments are conducted on five benchmark corpora using cross-validation to evaluate the proposed MRC model. Comprehensive experimental results suggest that the MRC and WMRC approaches perform far better than the prevalent bag-of-words approach in both performance and efficiency. Compared with the concentration based feature construction approach and local-concentration based feature extraction approach, MRC and WMRC achieve higher accuracy and μ1 measure, which demonstrates the effectiveness of the MRC model. In addition, it is shown that both the MRC and WMRC approaches cooperate well with variety of classification methods, which endows the MRC model with flexible capability in the real world.
  • Keywords
    e-mail filters; feature extraction; information filtering; unsolicited e-mail; MRC model; email; feature construction approach; gradual refinement; multiresolution-concentration; position-correlated information extraction; process-correlated information extraction; spam filtering; Accuracy; Detectors; Feature extraction; Pathogens; Unsolicited electronic mail; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2013 International Joint Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-6128-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.2013.6706876
  • Filename
    6706876