Title :
Scalable link analysis algorithm for a distributed memory environment
Author :
Arulraj, Joy James Prabhu
Author_Institution :
DCSE, Anna Univ., Chennai, India
Abstract :
The convergence rate of iterative stationary methods for the PageRank linear system is significant for efficient computation of the relative rank of web pages based on the Web graph. This paper investigates this problem from the perspective of a message-passing framework on a distributed memory parallel programming model. A three-dimensional Web model is introduced to present a scalable link analysis algorithm that can compute the rank vector of a linear system with a high degree of convergence and accuracy. The algorithm derives its scalability by eliminating the need for any global information or page categorization. The details of the MPI-based parallel implementation and the experimental results obtained on a 8-node cluster of Intel Xeon 3.0 GHz, 1.5GB RAM machines over a 100MBit Ethernet LAN are provided. The variation in convergence behavior with residual probability is analyzed. For the WebBase crawl, a speedup of 12.45 is obtained by the scalable algorithm over the centralized algorithm.
Keywords :
Web sites; convergence; graph theory; iterative methods; message passing; parallel programming; probability; search engines; Ethernet LAN; Intel Xeon; MPI; PageRank linear system; Web graph; Web pages ranking; WebBase crawl; centralized algorithm; convergence rate; distributed memory parallel programming model; frequency 3.0 GHz; iterative stationary method; memory size 1.5 GByte; message-passing framework; residual probability; scalable algorithm; scalable link analysis algorithm; three-dimensional Web model; Algorithm design and analysis; Clustering algorithms; Convergence; Grid computing; Iterative methods; Linear systems; Markov processes;
Conference_Titel :
Parallel Distributed and Grid Computing (PDGC), 2010 1st International Conference on
Conference_Location :
Solan
Print_ISBN :
978-1-4244-7675-6
DOI :
10.1109/PDGC.2010.5679603