• DocumentCode
    1925509
  • Title

    Scalable link analysis algorithm for a distributed memory environment

  • Author

    Arulraj, Joy James Prabhu

  • Author_Institution
    DCSE, Anna Univ., Chennai, India
  • fYear
    2010
  • fDate
    28-30 Oct. 2010
  • Firstpage
    61
  • Lastpage
    66
  • Abstract
    The convergence rate of iterative stationary methods for the PageRank linear system is significant for efficient computation of the relative rank of web pages based on the Web graph. This paper investigates this problem from the perspective of a message-passing framework on a distributed memory parallel programming model. A three-dimensional Web model is introduced to present a scalable link analysis algorithm that can compute the rank vector of a linear system with a high degree of convergence and accuracy. The algorithm derives its scalability by eliminating the need for any global information or page categorization. The details of the MPI-based parallel implementation and the experimental results obtained on a 8-node cluster of Intel Xeon 3.0 GHz, 1.5GB RAM machines over a 100MBit Ethernet LAN are provided. The variation in convergence behavior with residual probability is analyzed. For the WebBase crawl, a speedup of 12.45 is obtained by the scalable algorithm over the centralized algorithm.
  • Keywords
    Web sites; convergence; graph theory; iterative methods; message passing; parallel programming; probability; search engines; Ethernet LAN; Intel Xeon; MPI; PageRank linear system; Web graph; Web pages ranking; WebBase crawl; centralized algorithm; convergence rate; distributed memory parallel programming model; frequency 3.0 GHz; iterative stationary method; memory size 1.5 GByte; message-passing framework; residual probability; scalable algorithm; scalable link analysis algorithm; three-dimensional Web model; Algorithm design and analysis; Clustering algorithms; Convergence; Grid computing; Iterative methods; Linear systems; Markov processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Distributed and Grid Computing (PDGC), 2010 1st International Conference on
  • Conference_Location
    Solan
  • Print_ISBN
    978-1-4244-7675-6
  • Type

    conf

  • DOI
    10.1109/PDGC.2010.5679603
  • Filename
    5679603