DocumentCode
3237567
Title
A parallel and efficient approach to large scale clone detection
Author
Sajnani, Hitesh ; Lopes, Cristiano
Author_Institution
Donald Bren Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, Irvine, CA, USA
fYear
2013
fDate
19-19 May 2013
Firstpage
46
Lastpage
52
Abstract
Over the past few years, researchers have implemented various algorithms to improve the scalability of clone detection. Most of these algorithms focus on scaling vertically on a single machine, and require complex intermediate data structures (e.g., suffix tree, etc.). However, several new use-cases of clone detection have emerged, which are beyond the computational capacity of a single machine. Moreover, for some of these usecases it may be expensive to invest upfront in the cost of building these data structures. In this paper, we propose a technique to horizontally scale clone detection across multiple machines using the popular MapReduce framework. The technique does not require building any complex intermediate data structures. Moreover, in order to increase the efficiency, the technique uses a filtering heuristic to prune the number of code block comparisons. The filtering heuristic is independent of our approach and it can be used in conjunction with other approaches to increase their efficiency. In our experiments, we found that: (i) the computation time to detect clones decreases by almost half every time we double the number of nodes; and (ii) the scaleup is linear, with a decline of not more than 70% compared to the ideal case, on a cluster of 2-32 nodes for 150-2800 projects.
Keywords
data structures; software maintenance; MapReduce framework; clone detection scalability; complex intermediate data structures; filtering heuristic; horizontally scale clone detection; single machine computational capacity; software maintenance; Availability; Buildings; Cloning; Companies; Data structures; Frequency measurement; Indexes;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Clones (IWSC), 2013 7th International Workshop on
Conference_Location
San Francisco, CA
Type
conf
DOI
10.1109/IWSC.2013.6613042
Filename
6613042
Link To Document