DocumentCode
1785293
Title
CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model
Author
Saxe, Joshua ; Turner, Richard ; Blokhin, Kristina
fYear
2014
fDate
28-30 Oct. 2014
Firstpage
68
Lastpage
75
Abstract
In this paper we introduce CrowdSource, a statistical natural language processing system designed to make rapid inferences about malware functionality based on printable character strings extracted from malware binaries. CrowdSource “learns” a mapping between low-level language and high-level software functionality by leveraging millions of web technical documents from StackExchange, a popular network of technical question and answer sites, using this mapping to infer malware capabilities. This paper describes our approach and provides an evaluation of its accuracy and performance, demonstrating that it can detect at least 14 high-level malware capabilities in unpacked malware binaries with an average per-capability f-score of 0.86 and at a rate of tens of thousands of binaries per day on commodity hardware.
Keywords
Internet; inference mechanisms; invasive software; learning (artificial intelligence); natural language processing; text analysis; CrowdSource; StackExchange; Web technical documents; automated inference; commodity hardware; crowd trained machine learning model; f-score; high level malware functionality inference; high-level software functionality; low-level language; low-level symbols; printable character string extraction; statistical natural language processing system; unpacked malware binaries; Accuracy; Bayes methods; Indexes; Malware; Probabilistic logic; Protocols; Software; Malicious applications; application security; computer security; network security; reverse engineering;
fLanguage
English
Publisher
ieee
Conference_Titel
Malicious and Unwanted Software: The Americas (MALWARE), 2014 9th International Conference on
Conference_Location
Fajardo, PR
Print_ISBN
978-1-4799-7328-6
Type
conf
DOI
10.1109/MALWARE.2014.6999417
Filename
6999417
Link To Document