DocumentCode
1614940
Title
Distributed pagerank for P2P systems
Author
Sankaralingam, Karthikeyan ; Sethumadhavan, Simha ; Browne, James C.
Author_Institution
Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
fYear
2003
Firstpage
58
Lastpage
68
Abstract
This paper defines and describes a fully distributed implementation of Google\´s highly effective pagerank algorithm, for "peer to peer" (P2P) systems. The implementation is based on chaotic (asynchronous) iterative solution of linear systems. The P2P implementation also enables incremental computation of pageranks as new documents are entered into or deleted from the network. Incremental update enables continuously accurate pageranks whereas the currently centralized web crawl and computation over Internet documents requires several days. This suggests possible applicability of the distributed algorithm to pagerank computations as a replacement for the centralized Web crawler based implementation for Internet documents. A complete solution of the distributed pagerank computation for an in-place network converges rapidly (1% accuracy in 10 iterations) for large systems although the time for iteration may be long. The incremental computation resulting from addition of a single document converges extremely rapidly, typically requiring update path lengths of fewer than 15 nodes even for large networks and very accurate solutions. This implementation of pagerank provides a uniform ranking scheme for documents in P2P systems, and its integration with P2P keyword search provides one solution to the network traffic problems engendered by return of document hits. In basic P2P keyword search, all the document hits must be returned to the querying node causing large network traffic. An incremental keyword search algorithm for P2P keyword search where document hits are sorted by pagerank, and incrementally returned to the querying node is proposed and evaluated. Integration of this algorithm into P2P keyword search can produce dramatic benefit both in terms of effectiveness for users and decrease in network traffic. The incremental search algorithm provided approximately a ten-fold reduction in network traffic for two-word and three-word querying.
Keywords
Internet; client-server systems; document handling; information retrieval; search engines; Googles pagerank algorithm; Internet documents; P2P keyword search; P2P systems; Web crawler; chaotic iterative solution; distributed algorithm; distributed pagerank; in-place network; incremental computation; incremental keyword search algorithm; incremental update; linear systems; network traffic; pagerank computations; peer to peer; querying node; Chaos; Computer networks; Distributed algorithms; Distributed computing; Internet; Iterative algorithms; Keyword search; Linear systems; Peer to peer computing; Telecommunication traffic;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on
ISSN
1082-8907
Print_ISBN
0-7695-1965-2
Type
conf
DOI
10.1109/HPDC.2003.1210016
Filename
1210016
Link To Document