DocumentCode :
243474
Title :
Evolving Decision Trees for the Categorization of Software
Author :
Hosic, Jasenko ; Tauritz, Daniel R. ; Mulder, Samuel A.
Author_Institution :
Dept. of Comput. Sci., Missouri Univ. of Sci. & Technol., Rolla, MO, USA
fYear :
2014
fDate :
21-25 July 2014
Firstpage :
337
Lastpage :
342
Abstract :
Current manual techniques of static reverse engineering are inefficient at providing semantic program understanding. We have developed an automated method to categorize applications in order to quickly determine pertinent characteristics. Prior work in this area has had some success, but a major strength of our approach is that it produces heuristics that can be reused for quick analysis of new data. Our method relies on a genetic programming algorithm to evolve decision trees which can be used to categorize software. The terminals, or leaf nodes, within the trees each contain values based on selected features from one of several attributes: system calls, byte n-grams, opcode n-grams, cyclomatic complexity, and bonding. The evolved decision trees are reusable and achieve average accuracies above 95% when categorizing programs based on compiler origin and versions. Developing new decision trees simply requires more labeled datasets and potentially different feature selection algorithms for other attributes, depending on the data being classified.
Keywords :
classification; decision trees; feature selection; software engineering; bonding; byte n-grams; cyclomatic complexity; decision trees; feature selection algorithms; genetic programming algorithm; opcode n-grams; software categorization; system calls; Accuracy; Bonding; Complexity theory; Decision trees; Histograms; Software; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference Workshops (COMPSACW), 2014 IEEE 38th International
Conference_Location :
Vasteras
Type :
conf
DOI :
10.1109/COMPSACW.2014.59
Filename :
6903152
Link To Document :
بازگشت