Title :
Structural information based term weighting in text retrieval for feature location
Author :
Bassett, Blake ; Kraft, Nicholas A.
Author_Institution :
Dept. of Comput. Sci., Univ. of Alabama, Tuscaloosa, AL, USA
Abstract :
Many recent feature location techniques (FLTs) apply text retrieval (TR) techniques to corpora built from text embedded in source code. Term weighting is a standard preprocessing step in TR and is used to adjust the importance of a term within a document or corpus. Common term weighting schemes such as tf-idf may not be optimal for use with source code, because they originate from a natural language context and were designed for use with unstructured documents. In this paper we propose a new approach to term weighting in which term weights are assigned using the structural information from the source code. We then evaluate the proposed approach by conducting an empirical study of a TR-based FLT. In all, we study over 400 bugs and features from five open source Java systems and find that structural term weighting can cause a statistically significant improvement in the accuracy of the FLT.
Keywords :
Java; feature extraction; information retrieval; natural language processing; public domain software; source coding; text analysis; TR-based FLT; corpus; feature location technique; natural language; open source Java system; source code; structural term weighting; term weight assignment; term weighting scheme; text embedding; text retrieval; unstructured document; Accuracy; Benchmark testing; Computational modeling; Indexing; Java; Large scale integration; Probability distribution; Program comprehension; feature location; latent Dirichlet allocation; static analysis; text retrieval;
Conference_Titel :
Program Comprehension (ICPC), 2013 IEEE 21st International Conference on
Conference_Location :
San Francisco, CA
DOI :
10.1109/ICPC.2013.6613841