DocumentCode
2452746
Title
An accurate estimation of the Levenshtein distance using metric trees and Manhattan distance
Author
Lavoie, Thierry ; Merlo, Ettore
Author_Institution
Dept. de genie Inf. et logiciel, Ecole Polytech. de Montreal, Montreal, QC, Canada
fYear
2012
fDate
4-4 June 2012
Firstpage
1
Lastpage
7
Abstract
This paper presents an original clone detection technique which is an accurate approximation of the Levenshtein distance. It uses groups of tokens extracted from source code called windowed-tokens. From these, frequency vectors are then constructed and compared with the Manhattan distance in a metric tree. The goal of this new technique is to provide a very high precision clone detection technique while keeping a high recall. Precision and recall measurement is done with respect to the Levenshtein distance. The testbench is a large scale open source software. The collected results proved the technique to be fast, simple, and accurate. Finally, this article presents further research opportunities.
Keywords
estimation theory; program compilers; program testing; public domain software; trees (mathematics); Levenshtein distance; Manhattan distance; accurate approximation; accurate estimation; frequency vectors; large scale open source software; metric trees; original clone detection technique; precision clone detection technique; precision measurement; recall measurement; research opportunity; source code; testbench; windowed-tokens; Algorithm design and analysis; Cloning; Measurement; Software; Software algorithms; Syntactics; Vectors; Clone detection; Levenshtein distance; Manhattan distance; Software clones;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Clones (IWSC), 2012 6th International Workshop on
Conference_Location
Zurich
Print_ISBN
978-1-4673-1794-8
Type
conf
DOI
10.1109/IWSC.2012.6227861
Filename
6227861
Link To Document