DocumentCode :
1974647
Title :
ARIGUMA Code Analyzer: Efficient Variant Detection by Identifying Common Instruction Sequences in Malware Families
Author :
Yang Zhong ; Yamaki, Hirofumi ; Yamaguchi, Yoshio ; Takakura, Hiroki
Author_Institution :
Grad. Sch. of Inf. Sci., Nagoya Univ., Nagoya, Japan
fYear :
2013
fDate :
22-26 July 2013
Firstpage :
11
Lastpage :
20
Abstract :
It is required in the first step of malware analysis to determine whether a given malware program is a variant of known ones. If it is surely not a variant, manual analysis against it is required. However, it is impossible to perform manual analysis, the cost of which is very high, over all the enormous number of newly found malware programs. An automatic and accurate malware program classification method should contribute to this situation. Existing methods suffer from such problems as the cost of calculating similarity between every pair of malware programs in a database, and the disability to precisely present the similarity and the difference between programs. In our approach, known malware programs are classified into families. A given malware program is determined to be a variant if it is classified into an existing family. Incremental clustering is then performed for the new one and the family, which reduces the cost of re-training and similarity calculation. Accurate comparison between programs is enabled by evaluating the difference between programs using the longest common subsequences (LCSs) of instructions. To reduce the amount of the costly calculation of LCSs, the numeric features of codes, such as cyclomatic complexity, the number of function calls and so on, are used to filter out dissimilar codes. Subsequences in the LCS of two codes are presented to malware analysts as the similarity between them, while those out of it are given as the difference. Experimental results show that this method can detect the name of APIs used in a malware which existing methods cannot, that it is useful to determine inserted codes which is used for generating variants to avoid pattern detection by anti-virus, and that it actually reduces the time to process malware programs without deteriorating the accuracy of classification.
Keywords :
application program interfaces; invasive software; pattern classification; pattern clustering; API; ARIGUMA code analyzer; LCS; common instruction sequence identification; cyclomatic complexity; function calls; incremental clustering; longest common subsequences; malware analysis; malware families; malware program classification; variant detection; Clustering algorithms; Databases; Feature extraction; Malware; Manuals; Training; Vectors; LCS; incremental clustering; malware classification; static analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual
Conference_Location :
Kyoto
Type :
conf
DOI :
10.1109/COMPSAC.2013.6
Filename :
6649793
Link To Document :
بازگشت