• DocumentCode
    659402
  • Title

    P-DOT: A model of computation for big data

  • Author

    Tao Luo ; Yin Liao ; Guoliang Chen ; Yunquan Zhang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    31
  • Lastpage
    37
  • Abstract
    In response to the high demand of big data analytics, several programming models on large and distributed cluster systems have been proposed and implemented, such as MapRe-duce, Dryad and Pregel. However, compared with high performance computing areas, the basis and principles of computation and communication behavior of big data analytics is not well studied. In this paper, we review the current big data computational model DOT and DOTA, and propose a more general and practical model p-DOT (p-phases DOT). p-DOT is not a simple extension, but with profound significance: for general aspects, any big data analytics job execution expressed in DOT model or BSP model can be represented by it; for practical aspects, it considers I/O behavior to evaluate performance overhead. Moreover, we provide a cost function implying that the optimal number of machines is near-linear to the square root of input size for a fixed algorithm and workload, and demonstrate the effectiveness of the function through several experiments.
  • Keywords
    Big Data; data analysis; parallel processing; BSP model; Big data analytics; Big data computational model; DOTA; Dryad; I-O behavior; MapRe-duce; P-DOT; Pregel; bulk synchronous parallel model; cost function; distributed cluster systems; high performance computing areas; p-phases DOT; programming models; Analytical models; Computational modeling; Data handling; Data models; Data storage systems; Information management; US Department of Transportation; big data; computational model; distributed system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691551
  • Filename
    6691551