• DocumentCode
    2718207
  • Title

    DHTs over Peer Clusters for Distributed Information Retrieval

  • Author

    Papapetrou, Odysseas ; Siberski, Wolf ; Balke, Wolf-Tilo ; Nejdl, Wolfgang

  • Author_Institution
    L3S Res. Center, Leibniz Univ. Hannover, Hannover
  • fYear
    2007
  • fDate
    21-23 May 2007
  • Firstpage
    84
  • Lastpage
    93
  • Abstract
    Distributed hash tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, building huge term indexes, as required for IR-style keyword search, are impractical with plain DHTs. Due to the large sizes of document term vocabularies, joining peers cause huge amounts of key inserts, and subsequently large numbers of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance. We show that this can be achieved by combining DHTs with peer clustering. Peers are first clustered into communities, each of the communities having a representative super-peer. Then all occurrences of a term in a community are published to the global DHT in a batch by the representative super-peer. Our evaluation shows that this reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.
  • Keywords
    file organisation; indexing; peer-to-peer computing; query processing; workstation clusters; DHT; IR-style keyword search; distributed hash tables; distributed information retrieval; document term vocabularies; index maintenance; peer clusters; query processing; Costs; Indexing; Information retrieval; Keyword search; Peer to peer computing; Performance gain; Publishing; Query processing; Scalability; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications, 2007. AINA '07. 21st International Conference on
  • Conference_Location
    Niagara Falls, ON
  • ISSN
    1550-445X
  • Print_ISBN
    0-7695-2846-5
  • Type

    conf

  • DOI
    10.1109/AINA.2007.60
  • Filename
    4220880