• DocumentCode
    607288
  • Title

    A partitioning technique for improving the performance of PageRank on Hadoop

  • Author

    Hoon Choi ; Jungho Um ; Hwamook Yoon ; Minho Lee ; Yunsoo Choi ; Wongoo Lee ; Sakwang Song ; Hanmin Jung

  • Author_Institution
    Inf. & Software Res. Center, KISTI, Daejeon, South Korea
  • fYear
    2012
  • fDate
    3-5 Dec. 2012
  • Firstpage
    458
  • Lastpage
    461
  • Abstract
    There are a lot of research results in large scale graph analysis on Hadoop. The performance of the graph analysis based on Hadoop is impacted by data partitioning. The effectiveness of data partitioning depends on how the data partitioning maintains data locality in each node of cluster, and this would be different from the problems faced with. One way of data partitioning known to be effective is partitioning data by domains. For instance, this technique could be very useful in partitioning data by areas analyzing web graphs. But this kind of improvement from the data partitioning is limited to specific problems. In this paper, we propose a data partitioning technique based on semi-clustering for analyzing web graphs with PageRank algorithm on Hadoop. With experiment, PageRank computation with our partitioning technique improves the performance, as the number of iterations increases. This method can be very effective in the case of large scale graph processing.
  • Keywords
    Internet; data analysis; graph theory; pattern clustering; Hadoop; PageRank algorithm; PageRank computation; Web graph analysis; data locality; data partitioning technique; semiclustering; Hadoop; PageRank; data partitioning; semi-clustering; web graph;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing and Convergence Technology (ICCCT), 2012 7th International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-0894-6
  • Type

    conf

  • Filename
    6530377