• DocumentCode
    787702
  • Title

    CCFinder: a multilinguistic token-based code clone detection system for large scale source code

  • Author

    Kamiya, Toshihiro ; Kusumoto, Shinji ; Inoue, Katsuro

  • Author_Institution
    Graduate Sch. of Eng. Sci., Osaka Univ., Japan
  • Volume
    28
  • Issue
    7
  • fYear
    2002
  • fDate
    7/1/2002 12:00:00 AM
  • Firstpage
    654
  • Lastpage
    670
  • Abstract
    A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.
  • Keywords
    computer aided software engineering; high level languages; large-scale systems; optimising compilers; software maintenance; software metrics; software tools; C language; C++ language; CASE tool; CCFinder; COBOL; FreeBSD; JDK; Java; Java Development Kit; Linux; NetBSD; case studies; duplicated code; input source text transformation; large-scale source code; multi-linguistic token-based code clone detection system; optimization techniques; software maintainability; software metrics; system characteristics identification; token-by-token comparison; Cloning; Computer aided software engineering; Java; Large-scale systems; Linux; Maintenance engineering; Programming profession; Software maintenance; Software systems; Software tools;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2002.1019480
  • Filename
    1019480