Title :
Configuring topic models for software engineering tasks in TraceLab
Author :
Dit, Bogdan ; Panichella, A. ; Moritz, E. ; Oliveto, Rocco ; Di Penta, Massimiliano ; Poshyvanyk, Denys ; De Lucia, Andrea
Author_Institution :
Coll. of William & Mary, Williamsburg, VA, USA
Abstract :
A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.
Keywords :
genetic algorithms; program diagnostics; software engineering; IR techniques; LDA-GA; TraceLab experiment; genetic algorithms; latent dirichlet allocation; software engineering tasks; topic models; traceability link recovery; Genetic algorithms; Libraries; Measurement; Natural languages; Sociology; Software; Statistics; Configurable; LDA; TraceLab; experiments; genetic algorithm; traceability;
Conference_Titel :
Traceability in Emerging Forms of Software Engineering (TEFSE), 2013 International Workshop on
Conference_Location :
San Francisco, CA
DOI :
10.1109/TEFSE.2013.6620164