• DocumentCode
    869989
  • Title

    Markov edit distance

  • Author

    Wei, Jie

  • Author_Institution
    City Coll., City Univ. of New York, NY, USA
  • Volume
    26
  • Issue
    3
  • fYear
    2004
  • fDate
    3/1/2004 12:00:00 AM
  • Firstpage
    311
  • Lastpage
    321
  • Abstract
    Edit distance was originally developed by Levenstein several decades ago to measure the distance between two strings. It was found that this distance can be computed by an elegant dynamic programming procedure. The edit distance has played important roles in a wide array of applications due to its representational efficacy and computational efficiency. To effect a more reasonable distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in recent years. There is, however, a fundamental problem with the original definition of edit distance that has remained elusive: its context-free nature. In determining the possible actions, i.e., insertion, deletion, and substitution, no systematic consideration was given to the local behaviors of the string/pattern in question that indeed encompass great amount of useful information regarding its content. In this paper, inspired by the success of the Markov Random Field theory, a new edit distance called Markov edit distance (MED) within the dynamic programming framework is proposed to take full advantage of the local statistical dependencies in the pattern in order to arrive at enhanced matching performance. Within this framework, two specialized distance measures are developed: The reshuffling MED to handle cases where a subpattern in the target pattern is the reshuffles of that in the source pattern, and the coherence MED which is able to incur local content based substitution, insertion, and deletion. The applications based on these two MEDs in string matching are then explored, whereof encouraging empirical results have been observed.
  • Keywords
    Markov processes; dynamic programming; string matching; text analysis; word processing; Markov edit distance; Markov random field theory; coherence MED; computational efficiency; elegant dynamic programming; representational efficacy; reshuffling MED; string matching; Application software; Computational efficiency; Computer Society; Cost function; Dynamic programming; Equations; Markov random fields; Pattern matching; Pattern recognition; Text processing; Algorithms; Artificial Intelligence; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Markov Chains; Models, Statistical; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity; Signal Processing, Computer-Assisted; Subtraction Technique;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2004.1262315
  • Filename
    1262315