DocumentCode :
2328747
Title :
On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems
Author :
Uddin, Md Sharif ; Roy, Chanchal K. ; Schneider, Kevin A. ; Hindle, Abram
Author_Institution :
Univ. of Saskatchewan, Saskatoon, SK, Canada
fYear :
2011
fDate :
17-20 Oct. 2011
Firstpage :
13
Lastpage :
22
Abstract :
Clone detection techniques essentially cluster textually, syntactically and/or semantically similar code fragments in or across software systems. For large datasets, similarity identification is costly both in terms of time and memory, and especially so when detecting near-miss clones where lines could be modified, added and/or deleted in the copied fragments. The capability and effectiveness of a clone detection tool mostly depends on the code similarity measurement technique it uses. A variety of similarity measurement approaches have been used for clone detection, including fingerprint based approaches, which have had varying degrees of success notwithstanding some limitations. In this paper, we investigate the effectiveness of simhash, a state of the art fingerprint based data similarity measurement technique for detecting both exact and near-miss clones in large scale software systems. Our experimental data show that simhash is indeed effective in identifying various types of clones in a software system despite wide variations in experimental circumstances. The approach is also suitable as a core capability for building other tools, such as tools for: incremental clone detection, code searching, and clone management.
Keywords :
cryptography; large-scale systems; software engineering; art fingerprint; data similarity measurement; fingerprint based approach; large scale software systems; near-miss clones detection; semantically similar code fragments; simhash; Cloning; Clustering algorithms; Complexity theory; Fingerprint recognition; Indexing; Software systems; clone detection; fingerprinting; simhash; similarity hashing; software clones;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reverse Engineering (WCRE), 2011 18th Working Conference on
Conference_Location :
Limerick
ISSN :
1095-1350
Print_ISBN :
978-1-4577-1948-6
Type :
conf
DOI :
10.1109/WCRE.2011.12
Filename :
6079770
Link To Document :
بازگشت