• DocumentCode
    2008754
  • Title

    An Application of Latent Dirichlet Allocation to Analyzing Software Evolution

  • Author

    Linstead, Erik ; Lopes, Cristina ; Baldi, Pierre

  • Author_Institution
    Bren Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, CA, USA
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    813
  • Lastpage
    818
  • Abstract
    We develop and apply unsupervised statistical topic models, in particular latent Dirichlet allocation, to identify functional components of source code and study their evolution over multiple project versions. We present results for two large, open source Java projects, Eclipse and Argo UML, which are well-known and well-studied within the software mining community. Our results demonstrate the effectiveness of probabilistic topic models in automatically summarizing the temporal dynamics of software concerns, with direct application to project management and program understanding. In addition to detecting the emergence of topics on the release timeline which represent integration points for key source code functionality, our techniques can also be used to pinpoint refactoring events in the underlying software design, as well as to identify general programming concepts whose prevalence is dependent only of the size of the code base to be analyzed. Complete results are available from our supplementary materials website at http://sourcerer.ics.uci.edu/icmla2008/software_evolution.html.
  • Keywords
    public domain software; software engineering; statistical analysis; Argo UML; Eclipse; latent Dirichlet allocation; open source Java projects; software design; software evolution analysis; software mining community; unsupervised statistical topic models; Application software; Computer bugs; History; Information analysis; Java; Linear discriminant analysis; Machine learning; Open source software; Project management; Software engineering; latent dirichlet allocation; software evolution; software mining; topic models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.47
  • Filename
    4725072