Title :
Towards the Extraction of Domain Concepts from the Identifiers
Author :
Abebe, Surafel Lemma ; Tonella, Paolo
Author_Institution :
Software Eng. Res. Unit, Fondazione Bruno Kessler, Trento, Italy
Abstract :
Program identifiers represent an invaluable source of information for developers who are not familiar with the code to be evolved. Domain concepts and inter-concept relationships can be automatically extracted by means of natural language processing techniques applied to the program identifiers. However, the ontology produced by this approach tends to be very large and to include implementation details that reduce its usefulness for domain concept understanding. In this paper, we analyze the effectiveness of information retrieval based techniques used to filter domain concepts and relations from the implementation details, so as to obtain a smaller, more informative domain ontology. In particular, we show that fully automated techniques based on keywords or topics have quite poor performance, while a semi-automated approach, requiring limited user involvement, can highly improve the filtering of domain concepts.
Keywords :
information filtering; natural language processing; ontologies (artificial intelligence); domain concept extraction; domain concept filtering; domain concept understanding; domain ontology; information retrieval based techniques; natural language processing technique; program identifier; Documentation; Filtering; Gold; Manuals; Natural languages; Ontologies; Servers; Program understanding; domain concept filtering; information retrieval; ontology extraction;
Conference_Titel :
Reverse Engineering (WCRE), 2011 18th Working Conference on
Conference_Location :
Limerick
Print_ISBN :
978-1-4577-1948-6
DOI :
10.1109/WCRE.2011.19