Title :
Streaming Malware Classification in the Presence of Concept Drift and Class Imbalance
Author :
Kegelmeyer, W. Philip ; Ken Chiang ; Ingram, Joe
Author_Institution :
Sandia Nat. Labs. Livermore, Livermore, CA, USA
Abstract :
Malware, or malicious software, is capable of performing any action or command that can be expressed in code and is typically used for illicit activities, such as e-mail spamming, corporate espionage, and identity theft. Most organizations rely on anti-virus software to identifymalware, which typically utilize signatures that can only identify previously-seen malware instances. We consider the detection ofmalware executables that are downloaded in streaming network data as a supervised machine learning problem. Using malwaredata collected over multiple years, we characterize the effect of concept drift and class imbalance on batch and streaming decision tree ensembles. In particular, we illustrate a surprising vulnerability generated by precisely the aspect of streaming methods that seemed most likely to help them, when compared to batch methods.
Keywords :
decision trees; invasive software; learning (artificial intelligence); antivirus software; batch decision tree ensembles; class imbalance; concept drift; corporate espionage; e-mail spamming; identity theft; malicious software; malware detection; malware identification; streaming decision tree ensembles; streaming malware classification; supervised machine learning problem; Accuracy; Bagging; Data models; Decision trees; Malware; Software; Standards;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location :
Miami, FL
DOI :
10.1109/ICMLA.2013.104