• DocumentCode
    2328853
  • Title

    Towards the Extraction of Domain Concepts from the Identifiers

  • Author

    Abebe, Surafel Lemma ; Tonella, Paolo

  • Author_Institution
    Software Eng. Res. Unit, Fondazione Bruno Kessler, Trento, Italy
  • fYear
    2011
  • fDate
    17-20 Oct. 2011
  • Firstpage
    77
  • Lastpage
    86
  • Abstract
    Program identifiers represent an invaluable source of information for developers who are not familiar with the code to be evolved. Domain concepts and inter-concept relationships can be automatically extracted by means of natural language processing techniques applied to the program identifiers. However, the ontology produced by this approach tends to be very large and to include implementation details that reduce its usefulness for domain concept understanding. In this paper, we analyze the effectiveness of information retrieval based techniques used to filter domain concepts and relations from the implementation details, so as to obtain a smaller, more informative domain ontology. In particular, we show that fully automated techniques based on keywords or topics have quite poor performance, while a semi-automated approach, requiring limited user involvement, can highly improve the filtering of domain concepts.
  • Keywords
    information filtering; natural language processing; ontologies (artificial intelligence); domain concept extraction; domain concept filtering; domain concept understanding; domain ontology; information retrieval based techniques; natural language processing technique; program identifier; Documentation; Filtering; Gold; Manuals; Natural languages; Ontologies; Servers; Program understanding; domain concept filtering; information retrieval; ontology extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reverse Engineering (WCRE), 2011 18th Working Conference on
  • Conference_Location
    Limerick
  • ISSN
    1095-1350
  • Print_ISBN
    978-1-4577-1948-6
  • Type

    conf

  • DOI
    10.1109/WCRE.2011.19
  • Filename
    6079777