Title :
Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys
Author :
Podnar, I. ; Rajman, M. ; Luu, T. ; Klemm, F. ; Aberer, Karl
Author_Institution :
Sch. of Comput. & Commun. Sci., Ecole Polytechnique Federale de Lausanne, Switzerland
Abstract :
The suitability of peer-to-peer (P2P) approaches for full-text Web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we formalize a novel indexing/retrieval model that achieves high performance, cost-efficient retrieval by indexing with highly discriminative keys (HDKs) stored in a distributed global index maintained in a structured P2P network. HDKs correspond to carefully selected terms and term sets appearing in a small number of collection documents. We provide a theoretical analysis of the scalability of our retrieval model and report experimental results obtained with our HDK-based P2P retrieval engine. These results show that, despite increased indexing costs, the total traffic generated with the HDK approach is significantly smaller than the one obtained with distributed single-term indexing strategies. Furthermore, our experiments show that the retrieval performance obtained with a random set of real queries is comparable to the one of centralized, single-term solution using the best state-of-the-art BM25 relevance computation scheme. Finally, our scalability analysis demonstrates that the HDK approach can scale to large networks of peers indexing Web-size document collections, thus opening the way towards viable, truly-decentralized Web retrieval.
Keywords :
Internet; document handling; full-text databases; indexing; information retrieval; peer-to-peer computing; Web-size document collections; cost-efficient retrieval; distributed global index; full-text Web retrieval; highly discriminative keys; peer-to-peer Web retrieval; scalability analysis; structured P2P network; Bandwidth; Costs; Engines; Indexing; Information retrieval; Peer to peer computing; Prototypes; Scalability; Traffic control; Vocabulary;
Conference_Titel :
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Conference_Location :
Istanbul
Print_ISBN :
1-4244-0802-4
DOI :
10.1109/ICDE.2007.368968