DocumentCode
2328853
Title
Towards the Extraction of Domain Concepts from the Identifiers
Author
Abebe, Surafel Lemma ; Tonella, Paolo
Author_Institution
Software Eng. Res. Unit, Fondazione Bruno Kessler, Trento, Italy
fYear
2011
fDate
17-20 Oct. 2011
Firstpage
77
Lastpage
86
Abstract
Program identifiers represent an invaluable source of information for developers who are not familiar with the code to be evolved. Domain concepts and inter-concept relationships can be automatically extracted by means of natural language processing techniques applied to the program identifiers. However, the ontology produced by this approach tends to be very large and to include implementation details that reduce its usefulness for domain concept understanding. In this paper, we analyze the effectiveness of information retrieval based techniques used to filter domain concepts and relations from the implementation details, so as to obtain a smaller, more informative domain ontology. In particular, we show that fully automated techniques based on keywords or topics have quite poor performance, while a semi-automated approach, requiring limited user involvement, can highly improve the filtering of domain concepts.
Keywords
information filtering; natural language processing; ontologies (artificial intelligence); domain concept extraction; domain concept filtering; domain concept understanding; domain ontology; information retrieval based techniques; natural language processing technique; program identifier; Documentation; Filtering; Gold; Manuals; Natural languages; Ontologies; Servers; Program understanding; domain concept filtering; information retrieval; ontology extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Reverse Engineering (WCRE), 2011 18th Working Conference on
Conference_Location
Limerick
ISSN
1095-1350
Print_ISBN
978-1-4577-1948-6
Type
conf
DOI
10.1109/WCRE.2011.19
Filename
6079777
Link To Document