• DocumentCode
    2084862
  • Title

    Detecting phishing e-mails using text and data mining

  • Author

    Pandey, Manjusha ; Ravi, Vignesh

  • Author_Institution
    Inst. for Dev. & Res. in Banking Technol., Hyderabad, India
  • fYear
    2012
  • fDate
    18-20 Dec. 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper presents text and data mining in tandem to detect the phishing email. The study employs Multilayer Perceptron (MLP), Decision Trees (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH), Probabilistic Neural Net (PNN), Genetic Programming (GP) and Logistic Regression (LR) for classification. A dataset of 2500 phishing and non phishing emails is analyzed after extracting 23 keywords from the email bodies using text mining from the original dataset. Further, we selected 12 most important features using t-statistic based feature selection. Here, we did not find statistically significant difference in sensitivity as indicated by t-test at 1% level of significance, both with and without feature selection across all techniques except PNN. Since, the GP and DT are not statistically significantly different either with or without feature selection at 1% level of significance, DT should be preferred because it yields `if-then´ rules, thereby increasing the comprehensibility of the system.
  • Keywords
    computer crime; data mining; decision trees; genetic algorithms; multilayer perceptrons; pattern classification; probability; regression analysis; support vector machines; text analysis; unsolicited e-mail; DT; GMDH; GP; LR; MLP; PNN; SVM; classification; data mining; decision trees; genetic programming; group method of data handling; if-then rules; keyword extraction; logistic regression; multilayer perceptron; phishing e-mail detection; probabilistic neural net; support vector machine; t-statistic based feature selection; text mining; Classification; Decision Tree; Genetic Programming; Group Method Of Data Handling; Logistic regression; Multilayer Perceptron; Phishing webpage; Probabilistic Neural Network; Support Vector Machine; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence & Computing Research (ICCIC), 2012 IEEE International Conference on
  • Conference_Location
    Coimbatore
  • Print_ISBN
    978-1-4673-1342-1
  • Type

    conf

  • DOI
    10.1109/ICCIC.2012.6510259
  • Filename
    6510259