Title :
Making mapreduce scheduling effective in erasure-coded storage clusters
Author :
Runhui Li ; Lee, Patrick P. C.
Author_Institution :
Chinese Univ. of Hong Kong, Hong Kong, China
Abstract :
With the explosive growth of data, enterprises increasingly adopt erasure coding on storage clusters to save storage space. On the other hand, erasure coding incurs higher performance overhead, especially during recovery. This motivates us to study the feasibility of alleviating performance overhead of erasure coding, while maintaining its storage efficiency advantage. In this paper, we study the performance issue of MapReduce when it runs on erasure-coded storage. We first review our previously proposed degraded-first scheduling, which avoids network bandwidth competition among degraded map tasks in failure mode, and hence improves the MapReduce performance over the default locality-first scheduling in MapReduce. We then show that the basic degraded-first scheduling may not work effectively when there are multiple running MapReduce jobs, and hence we propose heuristics to enhance the degraded-first scheduling design. Simulations demonstrate the performance gain of our enhanced degraded-first scheduling in a multi-job scenario. Our work makes a case that a new design of MapReduce scheduling is critical when we move to erasure-coded storage.
Keywords :
business data processing; encoding; parallel processing; processor scheduling; storage management; MapReduce performance; MapReduce scheduling; default locality-first scheduling; degraded-first scheduling; erasure-coded storage clusters; performance gain; performance overhead; storage space; Bandwidth; Encoding; Performance gain; Runtime; Schedules; Scheduling; Switches; Erasure coding; MapReduce; storage systems;
Conference_Titel :
Local and Metropolitan Area Networks (LANMAN), 2015 IEEE International Workshop on
Conference_Location :
Beijing
DOI :
10.1109/LANMAN.2015.7114730