DocumentCode
1783317
Title
Characterization and Optimization of Memory-Resident MapReduce on HPC Systems
Author
Yandong Wang ; Goldstone, Robin ; Weikuan Yu ; Teng Wang
fYear
2014
fDate
19-23 May 2014
Firstpage
799
Lastpage
808
Abstract
MapReduce is a widely accepted framework for addressing big data challenges. Recently, it has also gained broad attention from scientists at the U.S. leadership computing facilities as a promising solution to process gigantic simulation results. However, conventional high-end computing systems are constructed based on the compute-centric paradigm while big data analytics applications prefer a data-centric paradigm such as MapReduce. This work characterizes the performance impact of key differences between compute- and data-centric paradigms and then provides optimizations to enable a dual-purpose HPC system that can efficiently support conventional HPC applications and new data analytics applications. Using a state-of-the-art MapReduce implementation Spark and the Hyperion system at Lawrence Livermore National Laboratory, we have examined the impact of storage architectures, data locality and task scheduling to the memory-resident MapReduce jobs. Based on our characterization and findings of the performance behaviors, we have introduced two optimization techniques, namely Enhanced Load Balancer and Congestion-Aware Task Dispatching, to improve the performance of Spark applications.
Keywords
data analysis; optimisation; parallel processing; resource allocation; Hyperion system; Lawrence Livermore National Laboratory; Spark applications; compute-centric paradigms; congestion-aware task dispatching; data analytics applications; data locality; data-centric paradigm; dual-purpose HPC system; enhanced load balancer; high-end computing systems; memory-resident MapReduce jobs; optimization techniques; performance behaviors; storage architectures; task scheduling; Benchmark testing; Big data; Computer architecture; Optimization; Processor scheduling; Servers; Sparks;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.87
Filename
6877311
Link To Document