• DocumentCode
    3128506
  • Title

    Analyzing the Evolution of the Source Code Vocabulary

  • Author

    Abebe, Surafel Lemma ; Haiduc, Sonia ; Marcus, Andrian ; Tonella, Paolo ; Antoniol, Giuliano

  • Author_Institution
    FBK-irst, Trento
  • fYear
    2009
  • fDate
    24-27 March 2009
  • Firstpage
    189
  • Lastpage
    198
  • Abstract
    Source code is a mixed software artifact, containing information for both the compiler and the developers. While programming language grammar dictates how the source code is written, developers have a lot of freedom in writing identifiers and comments. These are intentional in nature and become means of communication between developers.The goal of this paper is to analyze how the source code vocabulary changes during evolution, through an exploratory study of two software systems. Specifically, we collected data to answer a set of questions about the vocabulary evolution, such as: How does the size of the source code vocabulary evolve over time? What do most frequent terms refer to? Are new identifiers introducing new terms? Are there terms shared between different types of identifiers and comments? Are new and deleted terms in a type of identifiers mirrored in other types of identifiers or in comments?
  • Keywords
    software maintenance; programming language grammar; software artifact; software systems; source code; source code vocabulary; Computer languages; Computer science; Guidelines; Information analysis; Knowledge management; Software engineering; Software maintenance; Software systems; Vocabulary; Writing; Lexicon evolution; Software vocabulary; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Maintenance and Reengineering, 2009. CSMR '09. 13th European Conference on
  • Conference_Location
    Kaiserslautern
  • ISSN
    1534-5351
  • Print_ISBN
    978-0-7695-3589-0
  • Type

    conf

  • DOI
    10.1109/CSMR.2009.61
  • Filename
    4812752