• DocumentCode
    2052747
  • Title

    Automatic Task Re-organization in MapReduce

  • Author

    Guo, Zhenhua ; Pierce, Marlon ; Fox, Geoffrey ; Zhou, Mo

  • Author_Institution
    Sch. of Inf. & Comput., Indiana Univ., Bloomington, IN, USA
  • fYear
    2011
  • fDate
    26-30 Sept. 2011
  • Firstpage
    335
  • Lastpage
    343
  • Abstract
    MapReduce is increasingly considered as a useful parallel programming model for large-scale data processing. It exploits parallelism among execution of primitive map and reduce operations. Hadoop is an open source implementation of MapReduce that has been used in both academic research and industry production. However, its implementation strategy that one map task processes one data block limits the degree of concurrency and degrades performance because of inability to fully utilize available resources. In addition, its assumption that task execution time in each phase does not vary much does not always hold, which makes speculative execution useless. In this paper, we present mechanisms to dynamically split and consolidate tasks to cope with load balancing and break through the concurrency limit resulting from fixed task granularity. For single-job systems, two algorithms are proposed for circumstances where prior knowledge is known and unknown. For multi-job cases, we propose a modified shortest-job-first strategy, which minimizes job turnaround time theoretically when combined with task splitting. We compared the effectiveness of our approach to the default task scheduling strategy using both synthesized and trace-based workloads. Simulation results show that our approach improves performance significantly.
  • Keywords
    large-scale systems; parallel programming; resource allocation; scheduling; Hadoop; MapReduce; automatic task reorganization; large-scale data processing; load balancing; parallel programming; shortest-job-first strategy; task granularity; task scheduling; task splitting; Clustering algorithms; Concurrent computing; Data models; Educational institutions; Load management; Scheduling; Skeleton; Bag-of-Divisible-Tasks; Load Balancing; MapReduce; Task Splitting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2011 IEEE International Conference on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-1355-2
  • Electronic_ISBN
    978-0-7695-4516-5
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2011.44
  • Filename
    6061152