• DocumentCode
    650614
  • Title

    Data Replication for Distributed Graph Processing

  • Author

    Li-Yung Ho ; Jan-Jan Wu ; Pangfeng Liu

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2013
  • fDate
    June 28 2013-July 3 2013
  • Firstpage
    319
  • Lastpage
    326
  • Abstract
    We present a data replication framework for distributed graph processing. First we partition a graph and store each partition in a machine. Then we replicate all partitions and assign replicas to machines, where each machine can store only a limited number of replicas. The goal is to replicate the partitions so that each partition has at least a certain number of replicated copies, and the cost is minimized. The cost is defined as the data traffic needed to run general graph processing algorithms. The cost metric is the overall transmission cost of all machines, and the maximum transmission cost of a single machine. We propose an optimal algorithm based on linear programming to solve the problem of minimizing the overall transmission cost. We also propose an optimal algorithm to solve a special problem of minimizing the maximum transmission cost of a node.
  • Keywords
    data analysis; distributed processing; graph theory; linear programming; minimisation; cost metric; data replication framework; data traffic; distributed graph processing; graph partitioning; linear programming; transmission cost minimisation; Clustering algorithms; Data models; Distributed databases; Equations; Linear programming; Social network services; Vectors; algorithm; binary integer programming; data replication; minimum cost flow; social networks; totally unimodular;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5028-2
  • Type

    conf

  • DOI
    10.1109/CLOUD.2013.55
  • Filename
    6676710