DocumentCode
3292470
Title
A Contextual Normalised Edit Distance
Author
de la Higuera, C. ; Mico, Luisa
Author_Institution
Univ. de St.-Etienne, St. Etienne
fYear
2008
fDate
11-12 April 2008
Firstpage
61
Lastpage
68
Abstract
In order to better fit a variety of pattern recognition problems over strings, using a normalised version of the edit or Levenshtein distance is considered to be an appropriate approach. The goal of normalisation is to take into account the lengths of the strings. We define a new normalisation, contextual, where each edit operation is divided by the length of the string on which the edit operation takes place. We prove that this contextual edit distance is a metric and that it can be computed through an extension of the usual dynamic programming algorithm for the edit distance. We also provide a fast heuristic which nearly always returns the same result and we show over several experiments that the distance obtains good results in classification tasks and has a low intrinsic dimension in comparison with other normalised edit distances.
Keywords
dynamic programming; string matching; Levenshtein distance; contextual normalised edit distance; dynamic programming algorithm; string pattern recognition problems; Dynamic programming; Heuristic algorithms; Pattern recognition; Edit distance; classification; intrinsic dimension; metrics; nearest neighbour; normalisation;
fLanguage
English
Publisher
ieee
Conference_Titel
Similarity Search and Applications, 2008. SISAP 2008. First International Workshop on
Conference_Location
Belfast
Print_ISBN
0-7695-3101-6
Type
conf
DOI
10.1109/SISAP.2008.17
Filename
4492926
Link To Document