DocumentCode
869989
Title
Markov edit distance
Author
Wei, Jie
Author_Institution
City Coll., City Univ. of New York, NY, USA
Volume
26
Issue
3
fYear
2004
fDate
3/1/2004 12:00:00 AM
Firstpage
311
Lastpage
321
Abstract
Edit distance was originally developed by Levenstein several decades ago to measure the distance between two strings. It was found that this distance can be computed by an elegant dynamic programming procedure. The edit distance has played important roles in a wide array of applications due to its representational efficacy and computational efficiency. To effect a more reasonable distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in recent years. There is, however, a fundamental problem with the original definition of edit distance that has remained elusive: its context-free nature. In determining the possible actions, i.e., insertion, deletion, and substitution, no systematic consideration was given to the local behaviors of the string/pattern in question that indeed encompass great amount of useful information regarding its content. In this paper, inspired by the success of the Markov Random Field theory, a new edit distance called Markov edit distance (MED) within the dynamic programming framework is proposed to take full advantage of the local statistical dependencies in the pattern in order to arrive at enhanced matching performance. Within this framework, two specialized distance measures are developed: The reshuffling MED to handle cases where a subpattern in the target pattern is the reshuffles of that in the source pattern, and the coherence MED which is able to incur local content based substitution, insertion, and deletion. The applications based on these two MEDs in string matching are then explored, whereof encouraging empirical results have been observed.
Keywords
Markov processes; dynamic programming; string matching; text analysis; word processing; Markov edit distance; Markov random field theory; coherence MED; computational efficiency; elegant dynamic programming; representational efficacy; reshuffling MED; string matching; Application software; Computational efficiency; Computer Society; Cost function; Dynamic programming; Equations; Markov random fields; Pattern matching; Pattern recognition; Text processing; Algorithms; Artificial Intelligence; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Markov Chains; Models, Statistical; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity; Signal Processing, Computer-Assisted; Subtraction Technique;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2004.1262315
Filename
1262315
Link To Document