مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

754598

Title :

The Noisy Substring Matching Problem

Author :

Kashyap, R.L. ; Oommen, B. John

Author_Institution :

School of Electrical Engineering, Purdue University

Issue :

fYear :

1983

fDate :

5/1/1983 12:00:00 AM

Firstpage :

365

Lastpage :

370

Abstract :

Let T(U) be the set of words in the dictionary H which contains U as a substring. The problem considered here is the estimation of the set T(U) when U is not known, but Y, a noisy version of U is available. The suggested set estimate S*(Y) of T(U) is a proper subset of H such that its every element contains at least one substring which resembles Y most according to the Levenshtein metric. The proposed algorithm for-the computation of S*(Y) requires cubic time. The algorithm uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarity measure between Y and a certain subset of the set of contiguous substrings of X. Another estimate of T(U), namely SM(Y) is also suggested. The accuracy of SM(Y) is only slightly less than that of S*(Y), but the computation time of SM(Y) is substantially less than that of S*(Y). Experimental results involving 1900 noisy substrings and dictionaries which are subsets of 1023 most common English words [11] indicate that the accuracy of the estimate S*(Y) is around 99 percent and that of SM(Y) is about 98 percent.

Keywords :

Error correction in strings; Levenshtein metric; noisy substring matching; string dissimilarity in terms of the dissimilarity of their substrings; string set estimation; text editing; Computer science; Databases; Dictionaries; Information retrieval; Recursive estimation; Error correction in strings; Levenshtein metric; noisy substring matching; string dissimilarity in terms of the dissimilarity of their substrings; string set estimation; text editing;

fLanguage :

English

Journal_Title :

Software Engineering, IEEE Transactions on

Publisher :

ieee

ISSN :

0098-5589

Type :

jour

DOI :

10.1109/TSE.1983.237018

Filename :

1703065

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=754598