DocumentCode :
2958867
Title :
Finding the most similar documents across multiple text databases
Author :
Yu, Clement ; Liu, King-Lup ; Wu, Wensheng ; Meng, Weiyi ; Rishe, Naphtali
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Illinois Univ., Chicago, IL, USA
fYear :
1999
fDate :
1999
Firstpage :
150
Lastpage :
162
Abstract :
We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies
Keywords :
database management systems; information retrieval; search engines; text analysis; database ranking; document retrieval; most similar documents; multiple text databases; relative performance; statistical method; Australia; Computer networks; Database systems; ISDN; Indexing; Information retrieval; Information systems; Internet; Machine learning; Transaction databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research and Technology Advances in Digital Libraries, 1999. Proceedings. IEEE Forum on
Conference_Location :
Baltimore, MD
ISSN :
1092-9959
Print_ISBN :
0-7695-0219-9
Type :
conf
DOI :
10.1109/ADL.1999.777710
Filename :
777710
Link To Document :
بازگشت