DocumentCode :
244993
Title :
Heterogeneous Metric Learning with Content-Based Regularization for Software Artifact Retrieval
Author :
Liang Wu ; Liang Du ; Bo Liu ; Guandong Xu ; Yong Ge ; Yanjie Fu ; Jianhui Li ; Yuanchun Zhou ; Hui Xiong
Author_Institution :
Comput. Network Inf. Center, Beijing, China
fYear :
2014
fDate :
14-17 Dec. 2014
Firstpage :
610
Lastpage :
619
Abstract :
The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other words, information retrieval techniques will be exploited based on the textual similarity between queries and textual representation of software artifacts, which is generated by collecting words from comments, identifiers, and descriptions of programs. However, in addition to these semantic information, there are rich information embedded in source codes themselves. These source codes, if analyzed properly, can be a rich source for enhancing the efforts of software artifact retrieval. To this end, in this paper, we develop a feature extraction method on source codes. Specifically, this method can capture both the inherent information in the source codes and the semantic information hidden in the comments, descriptions, and identifiers of the source codes. Moreover, we design a heterogeneous metric learning approach, which allows to integrate code features and text features into the same latent semantic space. This, in turn, can help to measure the artifact similarity by exploiting the joint power of both code and text features. Finally, extensive experiments on real-world data show that the proposed method can help to improve the performances of software artifact retrieval with a significant margin.
Keywords :
feature extraction; learning (artificial intelligence); query processing; source code (software); text analysis; artifact similarity; code feature; code repository; content-based regularization; feature extraction method; heterogeneous metric learning approach; information retrieval technique; latent semantic space; semantic information; software artifact retrieval; source code; text feature; textual query; textual representation; textual similarity; Electronic mail; Feature extraction; Information retrieval; Measurement; Optimization; Semantics; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
ISSN :
1550-4786
Print_ISBN :
978-1-4799-4303-6
Type :
conf
DOI :
10.1109/ICDM.2014.147
Filename :
7023378
Link To Document :
بازگشت