• DocumentCode
    866764
  • Title

    A methodology to retrieve text documents from multiple databases

  • Author

    Yu, Clement ; Liu, King-Lup ; Meng, Weiyi ; Wu, Zonghuan ; Rishe, Naphtali

  • Author_Institution
    Dept. of Comput. Sci., Illinois Univ., Chicago, IL, USA
  • Volume
    14
  • Issue
    6
  • fYear
    2002
  • Firstpage
    1347
  • Lastpage
    1361
  • Abstract
    This paper presents a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, the contents of databases are indicated approximately by database representatives. Databases are ranked using their representatives with respect to the given query. We provide a necessary and sufficient condition to rank the databases optimally. In order to satisfy this condition, we provide three estimation methods. One estimation method is intended for short queries; the other two are for all queries. Second, we provide an algorithm, OptDocRetrv, to retrieve documents from the databases according to their rank and in a particular way. We show that if the databases containing the n most similar documents for a given query are ranked ahead of other databases, our methodology will guarantee the retrieval of the n most similar documents for the query. When the number of databases is large, we propose to organize database representatives into a hierarchy and employ a best-search algorithm to search the hierarchy. It is shown that the effectiveness of the best-search algorithm is the same as that of evaluating the user query against all database representatives.
  • Keywords
    Internet; distributed databases; full-text databases; information resources; information retrieval; search engines; OptDocRetrv; best-search algorithm; database selection; distributed information retrieval; metasearch; multiple databases; query processing; resource discovery; search engine; text document retrieval; Databases; Indexes; Information retrieval; Metasearch; Scattering; Search engines; Sufficient conditions;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2002.1047772
  • Filename
    1047772