Title : 
Doppelgänger Finder: Taking Stylometry to the Underground
         
        
            Author : 
Afroz, Sadia ; Islam, Aylin Caliskan ; Stolerman, Ariel ; Greenstadt, Rachel ; Mccoy, Damon
         
        
            Author_Institution : 
Univ. of California, Berkeley, Berkeley, CA, USA
         
        
        
        
        
        
            Abstract : 
Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.
         
        
            Keywords : 
Internet; computer crime; meta data; standards; Carders forum; Doppelganger Finder; anonymous author identification; anonymous texts; cybercrime underworld; cybercriminal; data dumps; l33t-speak conversations; multilingual forums; security research community; structured metadata; stylometric methods; supervised stylometry problem; underground forum analysis; writing style analysis; Accuracy; Blogs; Detectors; Electronic mail; Manuals; Social network services; Stylometry; cybercrime; underground forum;
         
        
        
        
            Conference_Titel : 
Security and Privacy (SP), 2014 IEEE Symposium on
         
        
            Conference_Location : 
San Jose, CA