Title :
Automatic software clustering via Latent Semantic Analysis
Author :
Maletic, Jonathan I. ; Valluri, Naveen
Author_Institution :
Dept. of Math. Sci., Memphis Univ., TN, USA
Abstract :
The paper describes the initial results of applying Latent Semantic Analysis (LSA) to program source code and associated documentation. Latent Semantic Analysis is a corpus based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). The intent of applying Latent Semantic Analysis to software components is to automatically induce a specific semantic meaning of a given component. Here LSA is used as the basis to cluster software components. Results of applying this method to the LEDA library and MINIX operating system are given. Applying Latent Semantic Analysis to the domain of source code and internal documentation for the support of software reuse is a new application of this method and a departure from the normal application domain of natural language
Keywords :
automatic programming; computational linguistics; natural languages; software reusability; software tools; statistical analysis; LEDA library; LSA; Latent Semantic Analysis; MINIX operating system; automatic software clustering; corpus based statistical method; documentation; internal documentation; natural language; normal application domain; program source code; semantic meaning; software components; software reuse; Application software; Computer science; Documentation; Identity-based encryption; Matrix decomposition; Natural languages; Operating systems; Read only memory; Sparse matrices; Statistical analysis;
Conference_Titel :
Automated Software Engineering, 1999. 14th IEEE International Conference on.
Conference_Location :
Cocoa Beach, FL
Print_ISBN :
0-7695-0415-9
DOI :
10.1109/ASE.1999.802296