DocumentCode :
2155623
Title :
An empirical exploration of regularities in open-source software lexicons
Author :
Pierret, Derrin ; Poshyvanyk, Denys
Author_Institution :
Comput. Sci. Dept., Coll. of William & Mary, Williamsburg, VA
fYear :
2009
fDate :
17-19 May 2009
Firstpage :
228
Lastpage :
232
Abstract :
The software lexicon is an important source of information during program comprehension activities and it has been in the focus of several recent case studies. Identifiers and comments, which constitute a lexicon in software, encode domain concepts and design decisions made by programmers. The paper presents an exploratory study that investigates regularities in the software lexicons of open-source projects by analyzing distributions of tokens in diverse software artifacts. The study examined source code of 142 systems from different domains, written in 12 different programming languages, as well as bug reports and external documentation. We discover that distributions of lexical tokens in studied artifacts follow the Zipf-Mandelbrot law, which is an empirical law in statistical natural language processing. Furthermore, the study reveals that the Zipf-Mandelbrot law is not confined to program lexicons in object-oriented languages, as shown in the previous studies, but also emerges in source code written using procedural, functional and markup languages, as well as other software artifacts. Our study also indicates that a previously devised software science equation does not hold for describing the program vocabulary-length relationship and more studies are necessary.
Keywords :
functional languages; natural language processing; object-oriented programming; public domain software; statistical analysis; Zipf-Mandelbrot law; diverse software artifact; empirical exploration; external documentation; functional language; markup language; object-oriented programming language; open-source software lexicon; procedural language; program comprehension; program vocabulary-length; software science equation; statistical natural language processing; Application software; Computer languages; Documentation; Equations; Frequency; Java; Natural languages; Open source software; Programming profession; Software systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Program Comprehension, 2009. ICPC '09. IEEE 17th International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1092-8138
Print_ISBN :
978-1-4244-3998-0
Electronic_ISBN :
1092-8138
Type :
conf
DOI :
10.1109/ICPC.2009.5090047
Filename :
5090047
Link To Document :
بازگشت