Title :
A practical comparison of edit distance approximation algorithms
Author :
Hanada, Hiroyuki ; Nakamura, Atsuyoshi ; Kudo, Mineichi
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan
Abstract :
The edit distance is a basic string similarity measure for many applications such as string searching, text mining, signal processing, bioinformatics and so on. However, its high computational cost often prevents it from being used for a large set of strings like similar string searching. A promising solution for the problem is to approximate the edit distance with low computational cost. However, although there are many methods for approximating the edit distance, most of them are analyzed only theoretically. In fact, most of the methods can evaluate the edit distance only in terms of order notations, and do not conduct any experiment. This is a large obstacle for applying them to real applications. In this study we will first list up existing edit distance approximation methods. Then we compare them by: (i) approximation performances shown by the original authors, (ii) approximation performances re-analyzed by us (concrete values instead of the order notations) and (iii) experimental performances by our implementations.
Keywords :
approximation theory; data mining; string matching; text analysis; approximation performance; bioinformatics; edit distance approximation algorithm; high computational cost; low computational cost; order notation; signal processing; similarity measure; string searching; text mining; Approximation algorithms; Bioinformatics; Computational efficiency; Concrete; Function approximation; Vectors; approximate string matching; edit distance; string replacement; string searching; string vectorization;
Conference_Titel :
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4577-0372-0
DOI :
10.1109/GRC.2011.6122599