Author_Institution :
SOCCER Lab., Ecole Polytech. de Montreal, Montreal, QC, Canada
Abstract :
Concept identification is the task of locating and identifying concepts (e.g., domain concepts) into code region or, more generally, into artifact chunks. Concept identification is fundamental to program comprehension, software maintenance, and evolution. Different static, dynamic, and hybrid approaches for concept identification exist in the literature. Both static and dynamic techniques have advantages and limitations. In fact, they can be considered to complement each other. Indeed, recent works focused on hybrid techniques to improve the performance in time as well as accuracy (i.e., precision and recall) of the concept location process. Furthermore, sometimes only a single execution trace is available, however, to the best of our knowledge, only few works attempt to automatically identify concepts in a single execution trace. We propose an approach built upon a dynamic-programming algorithm to split an execution trace into segments likely representing concepts. The approach improves performance and scalability with respect to currently available techniques. We also plan to use techniques derived from Latent Dirichlet Allocation (LDA)to automatically assign meanings to segments.
Keywords :
data mining; dynamic programming; program diagnostics; software maintenance; artifact chunks; concept identification; concept location process; dynamic programming algorithm; execution traces; latent dirichlet allocation; program comprehension; scalable automatic concept mining; software evolution; software maintenance; Data mining; Heuristic algorithms; Information retrieval; Resource management; Scalability; Software maintenance; Concept identification; Dynamic analysis; Information retrieval; Latent Dirichlet Allocation;