DocumentCode
3235683
Title
Automatic Derivation of Concepts Based on the Analysis of Source Code Identifiers
Author
Guerrouj, Latifa
Author_Institution
DGIGL - SOCCER Lab., Ecole Polytech. de Montreal, Montréal, QC, Canada
fYear
2010
fDate
13-16 Oct. 2010
Firstpage
301
Lastpage
304
Abstract
The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Indeed, identifiers are developers´ main up-to-date source of information and guide their cognitive processes during program understanding when the high-level documentation is scarce or outdated and when the source code is not sufficiently commented. Deriving domain terms from identifiers using high-level and domain concepts is not an easy task when naming conventions (e.g., Camel Case) are not used or strictly followed and-or when these words have been abbreviated or otherwise transformed. Our thesis is to develop an approach that overcomes the shortcomings of the existing approaches and maps identifiers to domain concepts even in the absence of naming conventions and-or the presence of abbreviations. Our approach uses a thesaurus of words and abbreviations to map terms or transformed words composing identifiers to dictionary words. It relies on an oracle that we manually build for the validation of our results. To evaluate our technique, we apply it to derive concepts from identifiers of different systems and open source projects. We also enrich it by the use of domain knowledge and context-aware dictionaries to analyze how sensitive are its performances to the use of contextual information and specialized knowledge.
Keywords
software maintenance; system documentation; automatic derivation; cognitive processes; domain concepts; high-level documentation; maps identifiers; naming conventions; program understanding; software engineering; software maintainability; software understandability; source code identifiers; Buildings; Conferences; Dictionaries; Presses; Software; Speech recognition; Thesauri; Identifier Splitting; Linguistic Analysis; Program Comprehension; Software Quality;
fLanguage
English
Publisher
ieee
Conference_Titel
Reverse Engineering (WCRE), 2010 17th Working Conference on
Conference_Location
Beverly, MA
ISSN
1095-1350
Print_ISBN
978-1-4244-8911-4
Type
conf
DOI
10.1109/WCRE.2010.45
Filename
5645490
Link To Document