• DocumentCode
    3153647
  • Title

    Information extraction from spam emails using stylistic and semantic features to identify spammers

  • Author

    Halder, Soma ; Tiwari, Richa ; Sprague, Alan

  • Author_Institution
    Univ. of Alabama at Birmingham, Birmingham, AL, USA
  • fYear
    2011
  • fDate
    3-5 Aug. 2011
  • Firstpage
    104
  • Lastpage
    107
  • Abstract
    Traditional anti spamming methods filter spam emails and prevent them from entering the inbox but take no measure to trace spammers and penalize them. We use natural language processing techniques to cluster spam emails from the same spammer based on the content and the style of the email. Spam emails from different sources are studied with features like stylistic, semantic and combination of both. Three sets of clustering are performed: clustering based on stylistic feature, clustering based on semantic feature and clustering based on combined feature. These clusters are then compared and evaluated. We notice that spam emails from the same sources have similarities and cluster together. These emails have URLs of the WebPages that the spammer is trying to promote. Clusters are mapped to the internet protocol (IP) of these URLs and the who is information of the IP addresses´ help to get information about the source of spam.
  • Keywords
    Web sites; data mining; e-mail filters; feature extraction; natural language processing; pattern clustering; semantic Web; unsolicited e-mail; IP addresses; URL; Web pages; anti spamming methods; clustering; information extraction; internet protocol; natural language processing; spam email filters; spammers; stylistic semantic features; Clustering algorithms; Data mining; Feature extraction; IP networks; Semantics; Unsolicited electronic mail; IP address; Spam; natural language processing; semantics; stylistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2011 IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4577-0964-7
  • Electronic_ISBN
    978-1-4577-0965-4
  • Type

    conf

  • DOI
    10.1109/IRI.2011.6009529
  • Filename
    6009529