• DocumentCode
    3517592
  • Title

    A comparison of techniques for selecting text collections

  • Author

    D´Souza, Daryl J. ; Thom, James A. ; Zobel, Justin

  • Author_Institution
    Dept. of Comput. Sci., R. Melbourne Inst. of Technol., Vic., Australia
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    28
  • Lastpage
    32
  • Abstract
    Techniques for evaluating queries against a distributed text document database allow uniform access to its component collections. One such technique is to first choose a subset of collections, via a selection index. The index captures information about each collection such as which terms occur in which documents, term statistics, and collection statistics. A possible implementation of such an index is a lexicon, which maintains a complete list of terms in the database. Another approach is to partially index the database by extracting fewer terms but maintaining some information about each document. In this paper we explore three collection-ranking techniques, two based on lexicons and the other based on partial document indexes. Our experiments show that in most cases the lexicon approaches outperform the partial index approach
  • Keywords
    information retrieval systems; query processing; collection statistics; collection-ranking techniques; distributed text document database; lexicons; partial document indexes; partial index approach; queries evaluation; selection index; term statistics; text collections selection; Computer science; Distributed databases; Indexing; Information retrieval; Nominations and elections; Search engines; Statistics; Testing; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Conference, 2000. ADC 2000. Proceedings. 11th Australasian
  • Conference_Location
    Canberra, ACT
  • Print_ISBN
    0-7695-0528-7
  • Type

    conf

  • DOI
    10.1109/ADC.2000.819810
  • Filename
    819810