• DocumentCode
    3237567
  • Title

    A parallel and efficient approach to large scale clone detection

  • Author

    Sajnani, Hitesh ; Lopes, Cristiano

  • Author_Institution
    Donald Bren Sch. of Inf. & Comput. Sci., Univ. of California, Irvine, Irvine, CA, USA
  • fYear
    2013
  • fDate
    19-19 May 2013
  • Firstpage
    46
  • Lastpage
    52
  • Abstract
    Over the past few years, researchers have implemented various algorithms to improve the scalability of clone detection. Most of these algorithms focus on scaling vertically on a single machine, and require complex intermediate data structures (e.g., suffix tree, etc.). However, several new use-cases of clone detection have emerged, which are beyond the computational capacity of a single machine. Moreover, for some of these usecases it may be expensive to invest upfront in the cost of building these data structures. In this paper, we propose a technique to horizontally scale clone detection across multiple machines using the popular MapReduce framework. The technique does not require building any complex intermediate data structures. Moreover, in order to increase the efficiency, the technique uses a filtering heuristic to prune the number of code block comparisons. The filtering heuristic is independent of our approach and it can be used in conjunction with other approaches to increase their efficiency. In our experiments, we found that: (i) the computation time to detect clones decreases by almost half every time we double the number of nodes; and (ii) the scaleup is linear, with a decline of not more than 70% compared to the ideal case, on a cluster of 2-32 nodes for 150-2800 projects.
  • Keywords
    data structures; software maintenance; MapReduce framework; clone detection scalability; complex intermediate data structures; filtering heuristic; horizontally scale clone detection; single machine computational capacity; software maintenance; Availability; Buildings; Cloning; Companies; Data structures; Frequency measurement; Indexes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Clones (IWSC), 2013 7th International Workshop on
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • DOI
    10.1109/IWSC.2013.6613042
  • Filename
    6613042