Title :
Towards a Big Data Curated Benchmark of Inter-project Code Clones
Author :
Svajlenko, Jeffrey ; Islam, Judith F. ; Keivanloo, Iman ; Roy, Chanchal K. ; Mia, Mohammad Mamun
Author_Institution :
Dept. of Comput. Sci., Univ. of Saskatchewan, Saskatoon, SK, Canada
fDate :
Sept. 29 2014-Oct. 3 2014
Abstract :
Recently, new applications of code clone detection and search have emerged that rely upon clones detected across thousands of software systems. Big data clone detection and search algorithms have been proposed as an embedded part of these new applications. However, there exists no previous benchmark data for evaluating the recall and precision of these emerging techniques. In this paper, we present a Big Data clone detection benchmark that consists of known true and false positive clones in a Big Data inter-project Java repository. The benchmark was built by mining and then manually checking clones of ten common functionalities. The benchmark contains six million true positive clones of different clone types: Type-1, Type-2, Type-3 and Type-4, including various strengths of Type-3 similarity (strong, moderate, weak). These clones were found by three judges over 216 hours of manual validation efforts. We show how the benchmark can be used to measure the recall and precision of clone detection techniques.
Keywords :
Big Data; Java; data mining; project management; Big Data clone detection benchmark; Big Data curated benchmark; Big Data interproject Java repository; code clone detection; code clone search; data mining; false positive clones; interproject code clones; manual clone checking; moderate-similarity; precision evaluation; recall evaluation; software systems; strong-similarity; true positive clones; type-1 clones; type-2 clones; type-3 clones; type-3 similarity; type-4 clones; weak-similarity; Benchmark testing; Big data; Cloning; Detectors; Java; Manuals; Tagging; Big Clone Bench; benchmark; big data; clone detection; clone search; precision; recall; semantic similarity; syntactic similarity;
Conference_Titel :
Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on
Conference_Location :
Victoria, BC
DOI :
10.1109/ICSME.2014.77