Title :
Unsupervised Software Categorization Using Bytecode
Author :
Escobar-Avila, Javier ; Linares-Vasquez, Mario ; Haiduc, Sonia
Abstract :
Automatic software categorization is the task of assigning software systems or libraries to categories based on their functionality. Correctly assigning these categories is essential to ensure that relevant software can be easily retrieved by developers from large repositories. State of the art approaches either rely on the availability of the source code, or use supervised machine learning approaches, which require a set of already labeled software as training data. These restrictions make current approaches fail when such information is not available. We propose a novel approach, which overcomes these limitations by using semantic information recovered from byte code and an unsupervised algorithm to assign categories to software systems. We evaluated our approach in a study on the Apache Foundation Repository of Java libraries and the results indicate that our approach is able to correctly identify a correct category for 86% of the libraries.
Keywords :
Accuracy; Clustering algorithms; Data mining; Java; Software; Software libraries; bytecode; clustering; dirichlet process; software categorization; software profiles;
Conference_Titel :
Program Comprehension (ICPC), 2015 IEEE 23rd International Conference on
Conference_Location :
Florence, Italy
DOI :
10.1109/ICPC.2015.33