Title :
Fast retrieval of electronic documents in digital libraries
Author :
Wang, Jason T L ; Chang, Chia-Yo
Author_Institution :
Dept. of Comput. & Inf. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
Abstract :
This paper presents an index structure for retrieving electronic documents in digital libraries. The documents considered may contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those documents that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word, (or phrase) in the documents. Our approach is to store the documents sequentially in a database and hash their “fingerprints” into a number of “fingerprint files”. When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the documents. We derive a lower bound, based on which one can prune a large number of nonqualifying documents (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound
Keywords :
document handling; full-text databases; indexing; libraries; library automation; query processing; string matching; digital libraries; document database; electronic document retrieval; fingerprint files; histogram; index structure; lower bound; mistyped words; query string; searching; spelling errors; Bibliographies; Deductive databases; Dictionaries; Electronic mail; Fingerprint recognition; Indexes; Information retrieval; Information science; Software libraries; Voting;
Conference_Titel :
Tools with Artificial Intelligence, 1995. Proceedings., Seventh International Conference on
Conference_Location :
Herndon, VA
Print_ISBN :
0-8186-7312-5
DOI :
10.1109/TAI.1995.479516