• DocumentCode
    3407429
  • Title

    Automatically mining software-based, semantically-similar words from comment-code mappings

  • Author

    Howard, Matthew J. ; Gupta, Swastik ; Pollock, Lori ; Vijay-Shanker, K.

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Delaware, Newark, DE, USA
  • fYear
    2013
  • fDate
    18-19 May 2013
  • Firstpage
    377
  • Lastpage
    386
  • Abstract
    Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.
  • Keywords
    data mining; natural language processing; software maintenance; software tools; text analysis; WordNet; automatic mining software-based semantically-similar word mining; comment-code word mapping; computer science; natural language documents; natural language words; software artifacts; software development tools; software maintenance tools; software traceability; synonyms; textual artifact representation; vocabulary mismatch; Computer science; Context; Data mining; Maintenance engineering; Semantics; Software; Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    2160-1852
  • Print_ISBN
    978-1-4799-0345-0
  • Type

    conf

  • DOI
    10.1109/MSR.2013.6624052
  • Filename
    6624052