Title :
Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques
Author :
Madani, Nioosha ; Guerrouj, Latifa ; Di Penta, Massimiliano ; Guéhéneuc, Yann-Gäel ; Antoniol, Giuliano
Author_Institution :
SOCCER Lab., Ecole Polytech. de Montreal, Montréal, QC, Canada
Abstract :
The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifiers guide the cognitive process. Recognizing the words forming identifiers is not an easy task when naming conventions (e.g., Camel Case) are not used or strictly followed and-or when these words have been abbreviated or otherwise transformed. This paper proposes a technique inspired from speech recognition, i.e., dynamic time warping, to split identifiers into component words. The proposed technique has been applied to identifiers extracted from two different applications: JHotDraw and Lynx. Results compared to manually-built oracles and with Camel Case algorithm are encouraging. In fact, they show that the technique successfully recognizes words composing identifiers (even when abbreviated) in about 90% of cases and that it performs better than Camel Case. Furthermore, it was able to spot mistakes in the manually-built oracle.
Keywords :
software maintenance; source coding; speech recognition; text analysis; time warp simulation; Camel Case algorithm; JHotDraw; Lynx; cognitive process; dynamic time warping; information source; manually-built oracles; software engineering; software maintainability; software understandability; source code identifiers; speech recognition techniques; split identifiers; word recognition; Context; Dictionaries; Feature extraction; Heuristic algorithms; Manuals; Particle separators; Speech recognition; Source code identifiers; program comprehension;
Conference_Titel :
Software Maintenance and Reengineering (CSMR), 2010 14th European Conference on
Conference_Location :
Madrid
Print_ISBN :
978-1-61284-369-8
Electronic_ISBN :
1534-5351
DOI :
10.1109/CSMR.2010.31