• DocumentCode
    3235683
  • Title

    Automatic Derivation of Concepts Based on the Analysis of Source Code Identifiers

  • Author

    Guerrouj, Latifa

  • Author_Institution
    DGIGL - SOCCER Lab., Ecole Polytech. de Montreal, Montréal, QC, Canada
  • fYear
    2010
  • fDate
    13-16 Oct. 2010
  • Firstpage
    301
  • Lastpage
    304
  • Abstract
    The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Indeed, identifiers are developers´ main up-to-date source of information and guide their cognitive processes during program understanding when the high-level documentation is scarce or outdated and when the source code is not sufficiently commented. Deriving domain terms from identifiers using high-level and domain concepts is not an easy task when naming conventions (e.g., Camel Case) are not used or strictly followed and-or when these words have been abbreviated or otherwise transformed. Our thesis is to develop an approach that overcomes the shortcomings of the existing approaches and maps identifiers to domain concepts even in the absence of naming conventions and-or the presence of abbreviations. Our approach uses a thesaurus of words and abbreviations to map terms or transformed words composing identifiers to dictionary words. It relies on an oracle that we manually build for the validation of our results. To evaluate our technique, we apply it to derive concepts from identifiers of different systems and open source projects. We also enrich it by the use of domain knowledge and context-aware dictionaries to analyze how sensitive are its performances to the use of contextual information and specialized knowledge.
  • Keywords
    software maintenance; system documentation; automatic derivation; cognitive processes; domain concepts; high-level documentation; maps identifiers; naming conventions; program understanding; software engineering; software maintainability; software understandability; source code identifiers; Buildings; Conferences; Dictionaries; Presses; Software; Speech recognition; Thesauri; Identifier Splitting; Linguistic Analysis; Program Comprehension; Software Quality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reverse Engineering (WCRE), 2010 17th Working Conference on
  • Conference_Location
    Beverly, MA
  • ISSN
    1095-1350
  • Print_ISBN
    978-1-4244-8911-4
  • Type

    conf

  • DOI
    10.1109/WCRE.2010.45
  • Filename
    5645490