• DocumentCode
    541963
  • Title

    Don´t follow me: Spam detection in Twitter

  • Author

    Wang, Alex Hai

  • Author_Institution
    College of Information Sciences and Technology, The Pennsylvania State University, PA 18512, Dunmore, U.S.A.
  • fYear
    2010
  • fDate
    26-28 July 2010
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    The rapidly growing social network Twitter has been infiltrated by large amount of spam. In this paper, a spam detection prototype system is proposed to identify suspicious users on Twitter. A directed social graph model is proposed to explore the “follower” and “friend” relationships among Twitter. Based on Twitter´s spam policy, novel content-based features and graph-based features are also proposed to facilitate spam detection. A Web crawler is developed relying on API methods provided by Twitter. Around 25K users, 500K tweets, and 49M follower/friend relationships in total are collected from public available data on Twitter. Bayesian classification algorithm is applied to distinguish the suspicious behaviors from normal ones. I analyze the data set and evaluate the performance of the detection system. Classic evaluation metrics are used to compare the performance of various traditional classification methods. Experiment results show that the Bayesian classifier has the best overall performance in term of F-measure. The trained classifier is also applied to the entire data set. The result shows that the spam detection system can achieve 89% precision.
  • Keywords
    Bayesian methods; Crawlers; Feature extraction; Twitter; Unsolicited electronic mail; Classification; Machine learning; Social network security; Spam detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on
  • Conference_Location
    Athens, Greece
  • Electronic_ISBN
    978-989-8425-18-8
  • Type

    conf

  • Filename
    5741690