• DocumentCode
    58944
  • Title

    A Probabilistic Generative Model for Mining Cybercriminal Networks from Online Social Media

  • Author

    Lau, Raymond Y. K. ; Yunqing Xia ; Yunming Ye

  • Author_Institution
    Dept. of Inf. Syst., City Univ. of Hong Kong, Hong Kong, China
  • Volume
    9
  • Issue
    1
  • fYear
    2014
  • fDate
    Feb. 2014
  • Firstpage
    31
  • Lastpage
    43
  • Abstract
    There has been a rapid growth in the number of cybercrimes that cause tremendous financial loss to organizations. Recent studies reveal that cybercriminals tend to collaborate or even transact cyber-attack tools via the "dark markets" established in online social media. Accordingly, it presents unprecedented opportunities for researchers to tap into these underground cybercriminal communities to develop better insights about collaborative cybercrime activities so as to combat the ever increasing number of cybercrimes. The main contribution of this paper is the development of a novel weakly supervised cybercriminal network mining method to facilitate cybercrime forensics. In particular, the proposed method is underpinned by a probabilistic generative model enhanced by a novel context-sensitive Gibbs sampling algorithm. Evaluated based on two social media corpora, our experimental results reveal that the proposed method significantly outperforms the Latent Dirichlet Allocation (LDA) based method and the Support Vector Machine (SVM) based method by 5.23% and 16.62% in terms of Area Under the ROC Curve (AUC), respectively. It also achieves comparable performance as the state-of-the-art Partially Labeled Dirichlet Allocation (PLDA) method. To the best of our knowledge, this is the first successful research of applying a probabilistic generative model to mine cybercriminal networks from online social media.
  • Keywords
    data mining; digital forensics; sampling methods; social networking (online); AUC; PLDA method; area under the ROC curve; collaborative cybercrime activities; context-sensitive Gibbs sampling algorithm; cyber-attack tools; cybercrime forensics; dark markets; online social media; partially labeled Dirichlet allocation method; probabilistic generative model; social media corpora; supervised cybercriminal network mining method; underground cybercriminal communities; Computer crime; Computer security; Data mining; Hackers; Natural language processing; Network security; Probabilstic logic; Social network services;
  • fLanguage
    English
  • Journal_Title
    Computational Intelligence Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    1556-603X
  • Type

    jour

  • DOI
    10.1109/MCI.2013.2291689
  • Filename
    6710252