• DocumentCode
    700385
  • Title

    CloCom: Mining existing source code for automatic comment generation

  • Author

    Wong, Edmund ; Taiyue Liu ; Lin Tan

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2015
  • fDate
    2-6 March 2015
  • Firstpage
    380
  • Lastpage
    389
  • Abstract
    Code comments are an integral part of software development. They improve program comprehension and software maintainability. The lack of code comments is a common problem in the software industry. Therefore, it is beneficial to generate code comments automatically. In this paper, we propose a general approach to generate code comments automatically by analyzing existing software repositories. We apply code clone detection techniques to discover similar code segments and use the comments from some code segments to describe the other similar code segments. We leverage natural language processing techniques to select relevant comment sentences. In our evaluation, we analyze 42 million lines of code from 1,005 open source projects from GitHub, and use them to generate 359 code comments for 21 Java projects. We manually evaluate the generated code comments and find that only 23.7% of the generated code comments are good. We report to the developers the good code comments, whose code segments do not have an existing code comment. Amongst the reported code comments, seven have been confirmed by the developers as good and committable to the software repository while the rest await for developers´ confirmation. Although our approach can generate good and committable comments, we still have to improve the yield and accuracy of the proposed approach before it can be used in practice with full automation.
  • Keywords
    Java; data mining; natural language processing; public domain software; software maintenance; source code (software); CloCom; GitHub; Java projects; automatic comment generation; code clone detection techniques; code comments; existing source code mining; natural language processing techniques; open source projects; program comprehension; relevant comment sentences; similar code segments; software development; software industry; software maintainability; software repositories; software repository; Cloning; Context; Data mining; Databases; Java; Pattern matching; Software; comment generation; documentation; program comprehension;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on
  • Conference_Location
    Montreal, QC
  • Type

    conf

  • DOI
    10.1109/SANER.2015.7081848
  • Filename
    7081848