• DocumentCode
    2075430
  • Title

    Approximating edit distance efficiently

  • Author

    Bar-Yossef, Ziv ; Jayram, T.S. ; Krauthgamer, Robert ; Kumar, Ravi

  • Author_Institution
    Dept. of Electr. Eng., Technion, Haifa, Israel
  • fYear
    2004
  • fDate
    17-19 Oct. 2004
  • Firstpage
    550
  • Lastpage
    559
  • Abstract
    Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than ℓ, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn)23/ gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. ℓ gap problem, where ℓ can be as small as O(k2) - still with a constant sketch - but works only for strings that are mildly "nonrepetitive". Finally, we develop an n37/-approximation quasilinear time algorithm for edit distance, improving the previous best factor of n34/ (Cole and Hariharan, 2002); if the input strings are assumed to be nonrepetitive, then the approximation factor can be strengthened to n13/.
  • Keywords
    approximation theory; computational complexity; string matching; approximate nearest-neighbor schemes; approximation factor; approximation quasilinear time algorithm; edit distance approximation; gap versions; low-distortion embeddings; natural algorithmic problems; nonrepetitive strings; sketching algorithms; Algorithm design and analysis; Approximation algorithms; Bioinformatics; Books; Computational biology; Dynamic programming; Genomics; Heuristic algorithms; Text processing; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on
  • ISSN
    0272-5428
  • Print_ISBN
    0-7695-2228-9
  • Type

    conf

  • DOI
    10.1109/FOCS.2004.14
  • Filename
    1366275