DocumentCode :
1545029
Title :
Fast retrieval of electronic messages that contain mistyped words or spelling errors
Author :
Wang, Jason Tsong-Li ; Chang, Chia-Yo
Author_Institution :
Dept. of Comput. & Inf. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
Volume :
27
Issue :
3
fYear :
1997
fDate :
6/1/1997 12:00:00 AM
Firstpage :
441
Lastpage :
451
Abstract :
This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their “fingerprints” into a number of “fingerprint files.” When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound
Keywords :
database management systems; electronic mail; electronic messaging; file organisation; indexing; query processing; search problems; string matching; database; electronic message retrieval; fingerprint files; hashing; histogram; index structure; lower bound; mistyped words; pruning; query string; search key; searching; spelling errors; Bibliographies; Databases; Dictionaries; Fingerprint recognition; Histograms; Information retrieval; Information science; Voting;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4419
Type :
jour
DOI :
10.1109/3477.584951
Filename :
584951
Link To Document :
بازگشت