DocumentCode :
2453029
Title :
Shuffling and randomization for scalable source code clone detection
Author :
Keivanloo, Iman ; Roy, Chanchal K. ; Rilling, Juergen ; Charland, Philippe
Author_Institution :
Dept. of Comput. Sci., Concordia Univ., Montreal, QC, Canada
fYear :
2012
fDate :
4-4 June 2012
Firstpage :
82
Lastpage :
83
Abstract :
In this research, we present a novel approach that allows existing state of the art clone detection tools to scale to very large datasets. A key benefit of our approach is that the improved tools scalability is achieved using standard hardware and without modifying the original implementations of the subject tools. We use a hybrid approach comprising of shuffling, repetition, and random subset generation of the subject dataset. As part of the experimental evaluation, we applied our shuffling and randomization approach on two state of the art clone detection tools. Our experience shows that it is possible to scale the classical tools to a very large dataset using standard hardware, and without significantly affecting the overall recall while exploiting all the strengths of the original tools including the precision.
Keywords :
software engineering; software tools; clone detection tools; random subset generation; randomization; scalable source code clone detection; shuffling; tools scalability; Cloning; Computer science; Educational institutions; Gold; Hardware; Scalability; Standards; Clone detection; sampling; scalability; shuffling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Clones (IWSC), 2012 6th International Workshop on
Conference_Location :
Zurich
Print_ISBN :
978-1-4673-1794-8
Type :
conf
DOI :
10.1109/IWSC.2012.6227875
Filename :
6227875
Link To Document :
بازگشت