DocumentCode :
1827938
Title :
Streaming Malware Classification in the Presence of Concept Drift and Class Imbalance
Author :
Kegelmeyer, W. Philip ; Ken Chiang ; Ingram, Joe
Author_Institution :
Sandia Nat. Labs. Livermore, Livermore, CA, USA
Volume :
2
fYear :
2013
fDate :
4-7 Dec. 2013
Firstpage :
48
Lastpage :
53
Abstract :
Malware, or malicious software, is capable of performing any action or command that can be expressed in code and is typically used for illicit activities, such as e-mail spamming, corporate espionage, and identity theft. Most organizations rely on anti-virus software to identifymalware, which typically utilize signatures that can only identify previously-seen malware instances. We consider the detection ofmalware executables that are downloaded in streaming network data as a supervised machine learning problem. Using malwaredata collected over multiple years, we characterize the effect of concept drift and class imbalance on batch and streaming decision tree ensembles. In particular, we illustrate a surprising vulnerability generated by precisely the aspect of streaming methods that seemed most likely to help them, when compared to batch methods.
Keywords :
decision trees; invasive software; learning (artificial intelligence); antivirus software; batch decision tree ensembles; class imbalance; concept drift; corporate espionage; e-mail spamming; identity theft; malicious software; malware detection; malware identification; streaming decision tree ensembles; streaming malware classification; supervised machine learning problem; Accuracy; Bagging; Data models; Decision trees; Malware; Software; Standards;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location :
Miami, FL
Type :
conf
DOI :
10.1109/ICMLA.2013.104
Filename :
6786080
Link To Document :
بازگشت