• DocumentCode
    3314445
  • Title

    A MapReduce based parallel SVM for large scale spam filtering

  • Author

    Caruana, G. ; Maozhen Li ; Man Qi

  • Author_Institution
    Sch. of Eng. & Design, Brunel Univ., Uxbridge, UK
  • Volume
    4
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    2659
  • Lastpage
    2662
  • Abstract
    Spam continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) based techniques have been proposed for spam classification. However, SVM training is a computationally intensive process. This paper presents a parallel SVM algorithm for scalable spam filtering. By distributing, processing and optimizing the subsets of the training data across multiple participating nodes, the distributed SVM reduces the training time significantly. Ontology based concepts are also employed to minimize the impact of accuracy degradation when distributing the training data amongst the SVM classifiers.
  • Keywords
    ontologies (artificial intelligence); parallel algorithms; pattern classification; support vector machines; unsolicited e-mail; MapReduce; SVM classifiers; large scale spam filtering; ontology; parallel SVM algorithm; spam classification; Accuracy; Filtering; Machine learning; Ontologies; Support vector machines; Training; Training data; Classification; Machine Learning; Ontology Semantics; Parallel Computing; Support Vector Machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6020074
  • Filename
    6020074