DocumentCode :
3407483
Title :
A dataset for evaluating identifier splitters
Author :
Binkley, David ; Lawrie, Dawn ; Pollock, Lori ; Hill, Emily ; Vijay-Shanker, K.
Author_Institution :
Loyola Univ. Maryland, Baltimore, MD, USA
fYear :
2013
fDate :
18-19 May 2013
Firstpage :
401
Lastpage :
404
Abstract :
Software engineering and evolution techniques have recently started to exploit the natural language information in source code. A key step in doing so is splitting identifiers into their constituent words. While simple in concept, identifier splitting raises several challenging issues, leading to a range of splitting techniques. Consequently, the research community would benefit from a dataset (i.e., a gold set) that facilitates comparative studies of identifier splitting techniques. A gold set of 2,663 split identifiers was constructed from 8,522 individual human splitting judgements and can be obtained from www.cs.loyola.edu/~binkley/ludiso. This set´s construction and observations aimed at its effective use are described.
Keywords :
computational linguistics; program interpreters; software engineering; source coding; constituent words; human splitting judgements; identifier splitter evaluation dataset; identifier splitting techniques; natural language information; software engineering; software evolution techniques; source code; Data mining; Educational institutions; Gold; Java; Software; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on
Conference_Location :
San Francisco, CA
ISSN :
2160-1852
Print_ISBN :
978-1-4799-0345-0
Type :
conf
DOI :
10.1109/MSR.2013.6624055
Filename :
6624055
Link To Document :
بازگشت