• DocumentCode
    2155048
  • Title

    Automatic classication of large changes into maintenance categories

  • Author

    Hindle, Abram ; German, Daniel M. ; Godfrey, Michael W. ; Holt, Richard C.

  • Author_Institution
    Univ. of Waterloo, Waterloo, ON
  • fYear
    2009
  • fDate
    17-19 May 2009
  • Firstpage
    30
  • Lastpage
    39
  • Abstract
    Large software systems undergo significant evolution during their lifespan, yet often individual changes are not well documented. In this work, we seek to automatically classify large changes into various categories of maintenance tasks - corrective, adaptive, perfective, feature addition, and non-functional improvement - using machine learning techniques. In a previous paper, we found that many commits could be classified easily and reliably based solely on the manual analysis of the commit metadata and commit messages (i.e., without reference to the source code). Our extension is the automation of classification by training machine learners on features extracted from the commit metadata, such as the word distribution of a commit message, commit author, and modules modified. We validated the results of the learners via 10-fold cross validation, which achieved accuracies consistently above 50%, indicating good to fair results. We found that the identity of the author of a commit provided much information about the maintenance class of a commit, almost as much as the words of the commit message. This implies that for most large commits, the Source Control System (SCS) commit messages plus the commit author identity is enough information to accurately and automatically categorize the nature of the maintenance task.
  • Keywords
    learning (artificial intelligence); meta data; software maintenance; task analysis; automatic classification; commit messages; commit metadata; machine learning; maintenance categories; maintenance task; software systems; source control system; Automatic control; Automation; Control systems; Feature extraction; Machine learning; Maintenance; Merging; Programming profession; Software libraries; Software systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Program Comprehension, 2009. ICPC '09. IEEE 17th International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1092-8138
  • Print_ISBN
    978-1-4244-3998-0
  • Electronic_ISBN
    1092-8138
  • Type

    conf

  • DOI
    10.1109/ICPC.2009.5090025
  • Filename
    5090025