DocumentCode :
2577791
Title :
Software Clustering: Unifying Syntactic and Semantic Features
Author :
Misra, Janardan ; Annervaz, K.M. ; Kaulgud, Vikrant ; Sengupta, Shubhashis ; Titus, Gary
Author_Institution :
Accenture Technol. Labs., Bangalore, India
fYear :
2012
fDate :
15-18 Oct. 2012
Firstpage :
113
Lastpage :
122
Abstract :
Software clustering is an important technique for extracting high level component architecture from the underlying source code. One of the limitations of the existing approaches is that most of the proposed techniques use only similar types of features for estimating distance between source code elements. Therefore, in cases, where the selected features are poorly present in the source code, these techniques may not produce good quality results in absence of adequate inputs to work on. In this paper we propose an approach to overcome this limitation. Proposed approach uses a combination of multiple types of features together and applies automated weighing on the extracted features to enhance their information quality and to reduce noise. We define a way to estimate distance between code elements in terms of combination of multiple types of features. Weighted graph partitioning with a multi-objective global modularity criterion is used to select the clusters as architectural components. We describe methods for automated labeling of the extracted components and for generating inter-component interactions. We further discuss how the suggested approach extends to clustering at multiple hierarchical levels, to application portfolios, and even for improving precision for the feature location problem.
Keywords :
feature extraction; graph theory; object-oriented programming; pattern clustering; program diagnostics; software architecture; application portfolios; automated component labeling; cluster selection; distance estimation; feature extraction; feature location problem; high level component architecture extraction; information quality enhancement; inter-component interaction generation; multiobjective global modularity criterion; multiple hierarchical levels; noise reduction; semantic features; software clustering; source code elements; syntactic features; weighted graph partitioning; Reverse engineering; architectural recovery; component discovery; latent semantic indexing; lexical analysis; program comprehension; software clustering; vector space model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reverse Engineering (WCRE), 2012 19th Working Conference on
Conference_Location :
Kingston, ON
ISSN :
1095-1350
Print_ISBN :
978-1-4673-4536-1
Type :
conf
DOI :
10.1109/WCRE.2012.21
Filename :
6385107
Link To Document :
بازگشت