• DocumentCode
    2867150
  • Title

    Vocabulary normalization improves IR-based concept location

  • Author

    Binkley, David ; Lawrie, Dawn ; Uehlinger, C.

  • Author_Institution
    Comput. Sci. Dept., Loyola Univ. Maryland, Baltimore, MD, USA
  • fYear
    2012
  • fDate
    23-28 Sept. 2012
  • Firstpage
    588
  • Lastpage
    591
  • Abstract
    Tool support is crucial to modern software development, evolution, and maintenance. Early tools reused the static analysis performed by the compiler. These were followed by dynamic analysis tools and more recently tools that exploit natural language. This later class has the advantage that it can incorporate not only the code, but artifacts from all phases of software construction and its subsequent evolution. Unfortunately, the natural language found in source code often uses a vocabulary different from that used in other software artifacts and thus increases the vocabulary mismatch problem. This problem exists because many natural-language tools imported from Information Retrieval (IR) and Natural Language Processing (NLP) implicitly assume the use of a single natural language vocabulary. Vocabulary normalization, which goes well beyond simple identifier splitting, brings the vocabulary of the source into line with other artifacts. Consequently, it is expected to improve the performance of existing and future IR and NLP based tools. As a case study, an experiment with an LSI-based feature locator is replicated. Normalization universally improves performance. For the tersest queries, this improvement is over 180% (p <; 0.0001).
  • Keywords
    computational linguistics; information retrieval; natural language processing; program compilers; program diagnostics; software maintenance; software tools; system monitoring; IR-based concept location; LSI-based feature locator; NLP; compiler; dynamic analysis tools; information retrieval; natural language processing; natural language vocabulary; software development; software evolution; software maintenance; source code; static analysis; tool support; vocabulary mismatch; vocabulary normalization; Conferences; Educational institutions; Natural language processing; Software maintenance; Vocabulary; concept location; information retrieval; vocabulary normalization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Maintenance (ICSM), 2012 28th IEEE International Conference on
  • Conference_Location
    Trento
  • ISSN
    1063-6773
  • Print_ISBN
    978-1-4673-2313-0
  • Type

    conf

  • DOI
    10.1109/ICSM.2012.6405328
  • Filename
    6405328