مرکز منطقه ای اطلاع رساني علوم و فناوري - Ctcompare: Code clone detection using hashed token sequences

DocumentCode :

2453215

Title :

Ctcompare: Code clone detection using hashed token sequences

Author :

Toomey, Warren

Author_Institution :

Sch. of IT, Bond Univ., Robina, QLD, Australia

fYear :

2012

fDate :

4-4 June 2012

Firstpage :

Lastpage :

Abstract :

There is much research on the use of tokenized source code to find code clones both within and between trees of source code. Some approaches have used suffix trees [1], [3]; others have used variations of longest common substring algorithms [4], [5]. This paper outlines an algorithm, embodied in a new tool called ctcompare, that takes a different tokenization approach. Each code base to be compared is first lexically analysed to produce a sequence of tokens. These are then broken into overlapping tuples of N consecutive tokens. The tuples are then hashed and the hash values of token tuples are used to identify type-1 and type-2 clone pairs. Hashed token sequences combined with a database have already been used in earlier ctcompare versions and elsewhere [2], but with a significant performance penalty due to database insertions. The benefits of this approach over the existing research include the simultaneous comparison of multiple large code bases and fast absolute performance.

Keywords :

cryptography; source coding; trees (mathematics); code clone detection; ctcompare; hashed token sequences; suffix trees; tokenized source code; Algorithm design and analysis; Australia; Cloning; Databases; Educational institutions; Redundancy; Time measurement; clone detection; code clone; code redundancy; hash function; software;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Software Clones (IWSC), 2012 6th International Workshop on

Conference_Location :

Zurich

Print_ISBN :

978-1-4673-1794-8

Type :

conf

DOI :

10.1109/IWSC.2012.6227881

Filename :

6227881

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2453215