• DocumentCode
    1974907
  • Title

    Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code

  • Author

    Kuhn, Adrian

  • Author_Institution
    Software Composition Group, Univ. of Bern, Bern
  • fYear
    2009
  • fDate
    16-17 May 2009
  • Firstpage
    175
  • Lastpage
    178
  • Abstract
    As more and more open-source software components become available on the Internet we need automatic ways to label and compare them. For example, a developer who searches for reusable software must be able to quickly gain an understanding of retrieved components. This understanding cannot be gained at the level of source code due to the semantic gap between source code and the domain model. In this paper we present a lexical approach that uses the log-likelihood ratios of word frequencies to automatically provide labels for software components. We present a prototype implementation of our labeling/comparison algorithm and provide examples of its application. In particular, we apply the approach to detect trends in the evolution of a software system.
  • Keywords
    information retrieval; object-oriented programming; public domain software; software prototyping; text analysis; vocabulary; Internet; automatic software component labeling; label retrieval; log-likelihood ratio; open-source software component; software component evolution; software vocabulary; source code word frequency analysis; Application software; Frequency; Gaussian distribution; Internet; Labeling; Open source software; Prototypes; Software prototyping; Software systems; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mining Software Repositories, 2009. MSR '09. 6th IEEE International Working Conference on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    978-1-4244-3493-0
  • Type

    conf

  • DOI
    10.1109/MSR.2009.5069499
  • Filename
    5069499