• DocumentCode
    1783317
  • Title

    Characterization and Optimization of Memory-Resident MapReduce on HPC Systems

  • Author

    Yandong Wang ; Goldstone, Robin ; Weikuan Yu ; Teng Wang

  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    799
  • Lastpage
    808
  • Abstract
    MapReduce is a widely accepted framework for addressing big data challenges. Recently, it has also gained broad attention from scientists at the U.S. leadership computing facilities as a promising solution to process gigantic simulation results. However, conventional high-end computing systems are constructed based on the compute-centric paradigm while big data analytics applications prefer a data-centric paradigm such as MapReduce. This work characterizes the performance impact of key differences between compute- and data-centric paradigms and then provides optimizations to enable a dual-purpose HPC system that can efficiently support conventional HPC applications and new data analytics applications. Using a state-of-the-art MapReduce implementation Spark and the Hyperion system at Lawrence Livermore National Laboratory, we have examined the impact of storage architectures, data locality and task scheduling to the memory-resident MapReduce jobs. Based on our characterization and findings of the performance behaviors, we have introduced two optimization techniques, namely Enhanced Load Balancer and Congestion-Aware Task Dispatching, to improve the performance of Spark applications.
  • Keywords
    data analysis; optimisation; parallel processing; resource allocation; Hyperion system; Lawrence Livermore National Laboratory; Spark applications; compute-centric paradigms; congestion-aware task dispatching; data analytics applications; data locality; data-centric paradigm; dual-purpose HPC system; enhanced load balancer; high-end computing systems; memory-resident MapReduce jobs; optimization techniques; performance behaviors; storage architectures; task scheduling; Benchmark testing; Big data; Computer architecture; Optimization; Processor scheduling; Servers; Sparks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.87
  • Filename
    6877311