Title :
Folding Repeated Instructions for Improving Token-Based Code Clone Detection
Author :
Murakami, Hiroaki ; Hotta, Keisuke ; Higo, Yoshiki ; Igaki, Hiroshi ; Kusumoto, Shinji
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Osaka Univ., Suita, Japan
Abstract :
A variety of code clone detection methods have been proposed before now. However, only a small part of them is widely used. Widely-used methods are line-based and token-based ones. They have high scalability because they neither require deep source code analysis nor constructing complex intermediate structures for the detection. High scalability is one of the big advantages in code clone detection tools. On the other hand, line/token-based detections yield many false positives. One of the factors is the presence of repeated instructions in the source code. For example, herein we assume that there are consecutive three printf statements in C source code. If we apply a token-based detection to them, the former two statements are detected as a code clone of the latter two statements. However, such overlapped code clones are redundant and so not useful for developers. In this paper, we propose a new detection method that is free from the influence of the presence of repeated instructions. The proposed method transforms every of repeated instructions into a special form, and then it detects code clones using a suffix array algorithm. The transformation prevents many false positives from being detected. Also, the detection speed remains. The proposed detection method has already been developed as a software tool, FRISC. We confirmed the usefulness of the proposed method by conducting a quantitative evaluation of FRISC with Bellon´s oracle.
Keywords :
software tools; source coding; Bellon oracle; C source code; FRISC; folding repeated instructions; software tool; source code analysis; suffix array algorithm; token-based code clone detection; widely-used methods; Cloning; Humans; Indexes; Java; Scalability; Software tools; Code clone detection; False positive reduction; Tool comparison;
Conference_Titel :
Source Code Analysis and Manipulation (SCAM), 2012 IEEE 12th International Working Conference on
Conference_Location :
Trento
Print_ISBN :
978-1-4673-2398-7
DOI :
10.1109/SCAM.2012.21