DocumentCode :
2452756
Title :
Clone detection using rolling hashing, suffix trees and dagification: A case study
Author :
Thomsen, Mikkel Jønsson ; Henglein, Fritz
Author_Institution :
Dept. of Comput. Sci. (DIKU), Univ. of Copenhagen, Copenhagen, Denmark
fYear :
2012
fDate :
4-4 June 2012
Firstpage :
22
Lastpage :
28
Abstract :
Microsoft Dynamics NAV is a widely used enterprise resource planning system for small and medium-sized enterprises that, by design, encourages rapid customization by copy-paste programming. We report the results of analyzing clone detection for NAV using two previously published methods and one new algorithmic method: character-based sliding window sampling using Rabin-Karp hashing (MOSS), line-based sequence matching using suffix trees (CodeDup), and abstract-syntax-tree based graph sharing analysis (XMLClone). The latter is piggybacked on XMLStore, which stores XML trees as directed acyclic graphs (dags) where all isomorphic subtrees are identified and coalesced into single nodes, which can be done in linear time using multiset discrimination. This dagification discovers all well-formed Type-1 and, with suitable input normalization, Type-2 clones. We find that the subsequent dag analysis to discover Type-3 clones performs well on NAV source code, both in terms of computational complexity and precision. This suggests that efficient dagification and independently configurable dag interpretation may be valuable ingredients for modular clone detection.
Keywords :
XML; cryptography; directed graphs; enterprise resource planning; small-to-medium enterprises; trees (mathematics); CodeDup; MOSS; Microsoft dynamics NAV; Rabin-Karp hashing; XML trees; XMLClone; abstract-syntax-tree; character-based sliding window sampling; copy-paste programming; dagification; directed acyclic graph; enterprise resource planning system; graph sharing analysis; isomorphic subtrees; line-based sequence matching; linear time; modular clone detection; multiset discrimination; rolling hashing; small-and-medium-sized enterprises; suffix trees; Abstracts; Calibration; Cloning; Plagiarism; Syntactics; XML; Cod-eDup; ERP; MOSS; MS Dynamics NAV; XMLClone; XMLStore; clone; code; dagification; detection; discrimination; hashing; multiset; similarity; suffix; tree; winnowing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Clones (IWSC), 2012 6th International Workshop on
Conference_Location :
Zurich
Print_ISBN :
978-1-4673-1794-8
Type :
conf
DOI :
10.1109/IWSC.2012.6227862
Filename :
6227862
Link To Document :
بازگشت