Title :
Kernels based on weighted Levenshtein distance
Author :
Xu, Jianhua ; Zhang, Xuegong
Author_Institution :
Sch. of Math. & Comput. Sci., Nanjing Normal Univ., China
Abstract :
In some real world applications, the sample could be described as a string of symbols rather than a vector of real numbers. It is necessary to determine the similarity or dissimilarity of two strings in many training algorithms. The widely used notion of similarity of two strings with different lengths is the weighted Levenshtein distance (WLD), which implies the minimum total weights of single symbol insertions, deletions and substitutions required to transform one string into another. In order to incorporate prior knowledge of strings into kernels used in support vector machine and other kernel machines, we utilize variants of this distance to replace distance measure in the RBF and exponential kernels and inner product in polynomial and sigmoid kernels, and form a new class of string kernels: Levenshtein kernels in this paper. Combining our new kernels with support vector machine, the error rate and variance on UCI splice site recognition dataset over 20 run is 5.88∓0.53, which is better than the best result 9.5∓0.7 from other five training algorithms.
Keywords :
pattern recognition; radial basis function networks; support vector machines; exponential kernels; kernel machines; polynomial kernels; radial basis function; sigmoid kernels; support vector machine; training algorithms; weighted Levenshtein distance; Application software; DNA; Error analysis; Hidden Markov models; Kernel; Machine learning algorithms; Pattern recognition; Sequences; Support vector machines; Text categorization;
Conference_Titel :
Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on
Print_ISBN :
0-7803-8359-1
DOI :
10.1109/IJCNN.2004.1381147