DocumentCode
2867150
Title
Vocabulary normalization improves IR-based concept location
Author
Binkley, David ; Lawrie, Dawn ; Uehlinger, C.
Author_Institution
Comput. Sci. Dept., Loyola Univ. Maryland, Baltimore, MD, USA
fYear
2012
fDate
23-28 Sept. 2012
Firstpage
588
Lastpage
591
Abstract
Tool support is crucial to modern software development, evolution, and maintenance. Early tools reused the static analysis performed by the compiler. These were followed by dynamic analysis tools and more recently tools that exploit natural language. This later class has the advantage that it can incorporate not only the code, but artifacts from all phases of software construction and its subsequent evolution. Unfortunately, the natural language found in source code often uses a vocabulary different from that used in other software artifacts and thus increases the vocabulary mismatch problem. This problem exists because many natural-language tools imported from Information Retrieval (IR) and Natural Language Processing (NLP) implicitly assume the use of a single natural language vocabulary. Vocabulary normalization, which goes well beyond simple identifier splitting, brings the vocabulary of the source into line with other artifacts. Consequently, it is expected to improve the performance of existing and future IR and NLP based tools. As a case study, an experiment with an LSI-based feature locator is replicated. Normalization universally improves performance. For the tersest queries, this improvement is over 180% (p <; 0.0001).
Keywords
computational linguistics; information retrieval; natural language processing; program compilers; program diagnostics; software maintenance; software tools; system monitoring; IR-based concept location; LSI-based feature locator; NLP; compiler; dynamic analysis tools; information retrieval; natural language processing; natural language vocabulary; software development; software evolution; software maintenance; source code; static analysis; tool support; vocabulary mismatch; vocabulary normalization; Conferences; Educational institutions; Natural language processing; Software maintenance; Vocabulary; concept location; information retrieval; vocabulary normalization;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Maintenance (ICSM), 2012 28th IEEE International Conference on
Conference_Location
Trento
ISSN
1063-6773
Print_ISBN
978-1-4673-2313-0
Type
conf
DOI
10.1109/ICSM.2012.6405328
Filename
6405328
Link To Document