DocumentCode :
700370
Title :
Threshold-free code clone detection for a large-scale heterogeneous Java repository
Author :
Keivanloo, Iman ; Feng Zhang ; Ying Zou
Author_Institution :
Dept. of Electr. & Comput. Eng., Queen´s Univ., Kingston, ON, Canada
fYear :
2015
fDate :
2-6 March 2015
Firstpage :
201
Lastpage :
210
Abstract :
Code clones are unavoidable entities in software ecosystems. A variety of clone-detection algorithms are available for finding code clones. For Type-3 clone detection at method granularity (i.e., similar methods with changes in statements), dissimilarity threshold is one of the possible configuration parameters. Existing approaches use a single threshold to detect Type-3 clones across a repository. However, our study shows that to detect Type-3 clones at method granularity on a large-scale heterogeneous repository, multiple thresholds are often required. We find that the performance of clone detection improves if selecting different thresholds for various groups of clones in a heterogeneous repository (i.e., various applications). In this paper, we propose a threshold-free approach to detect Type-3 clones at method granularity across a large number of applications. Our approach uses an unsupervised learning algorithm, i.e., k-means, to determine true and false clones. We use a clone benchmark with 330,840 tagged clones from 24,824 open source Java projects for our study. We observe that our approach improves the performance significantly by 12% in terms of F-measure. Furthermore, our threshold-free approach eliminates the concern of practitioners about possible misconfiguration of Type-3 clone detection tools.
Keywords :
Java; public domain software; F-measure; clone benchmark; dissimilarity threshold; heterogeneous Java repository; method granularity; open source Java projects; software ecosystems; threshold-free code clone detection algorithms; type-3 clone detection tools; Benchmark testing; Cloning; Clustering algorithms; Google; Java; Optimization methods; Software systems; clone detection; clone search; clustering; large-scale repository; threshold-free; unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on
Conference_Location :
Montreal, QC
Type :
conf
DOI :
10.1109/SANER.2015.7081830
Filename :
7081830
Link To Document :
بازگشت