Title :
Normalized Sum-over-Paths Edit Distances
Author :
Diez, Silvia Garcia ; Fouss, Francois ; Shimbo, Masashi ; Saerens, Marco
Author_Institution :
Unite de Syst. d´´Inf. (ISYS), Univ. Catholique de Louvain, Louvain-la-Neuve, Belgium
Abstract :
In this paper, normalized SoP string-edit distances, taking into account all possible alignments between two sequences, are investigated. These normalized distances are variants of the Sum-over-Paths (SoP) distances which compute the expected cost on all sequence alignments by favoring low-cost ones - therefore favoring good alignment. Such distances consider two sequences tied by many optimal or nearly-optimal alignments as more similar than two sequences sharing only one, optimal, alignment. They depend on a parameter, θ, and reduce to the standard distances - the edit-distance or the longest common subsequence - when θ → 0, while having the same time complexity. This paper puts the emphasis on applying some type of normalization of the expectation of the cost. Experimental results for clustering and classification tasks performed on four OCR data sets show that (i) the applied normalization generally improves the existing results, and (ii) as for the SoP edit-distances, the normalized SoP edit-distances clearly outperform the non-randomized measures, i.e. the standard edit distance and longest common subsequence.
Keywords :
computational complexity; optical character recognition; pattern classification; pattern clustering; string matching; OCR data sets; classification tasks; clustering task; longest common subsequence; nearly-optimal alignments; normalization; normalized SoP string-edit distances; normalized distances; normalized sum-over-paths edit distances; sequence alignments; standard edit distance; time complexity; Electronic mail; Indexes; Kernel; Light emitting diodes; Pattern recognition; Presses; Support vector machines; edit distance; longest common subsequence; normalization; randomized-shortest paths;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.261