DocumentCode :
669933
Title :
Semantic characterization of MapReduce workloads
Author :
Zhihong Xu ; Hirzel, Martin ; Rothermel, Gregg
Author_Institution :
Univ. of Nebraska, Lincoln, NE, USA
fYear :
2013
fDate :
22-24 Sept. 2013
Firstpage :
87
Lastpage :
97
Abstract :
MapReduce is a platform for analyzing large amounts of data on clusters of commodity machines. MapReduce is popular, in part thanks to its apparent simplicity. However, there are unstated requirements for the semantics of MapReduce applications that can affect their correctness and performance. MapReduce implementations do not check whether user code satisfies these requirements, leading to time-consuming debugging sessions, performance problems, and, worst of all, silently corrupt results. This paper makes these requirements explicit, framing them as semantic properties and assumed outcomes. It describes a black-box approach for testing for these properties, and uses the approach to characterize the semantics of 23 non-trivial MapReduce workloads. Surprisingly, we found that for most requirements, there is at least one workload that violates it. This means that MapReduce may be simple to use, but it is not as simple to use correctly. Based on our results, we provide insights to users on how to write higher-quality MapReduce code, and insights to system and language designers on ways to make their platforms more robust.
Keywords :
data analysis; parallel programming; program diagnostics; program testing; black-box testing; commodity machines; data analysis; higher-quality MapReduce code; semantic MapReduce workload characterization; semantic properties; Commutation; Context; Debugging; Educational institutions; Fault tolerance; Semantics; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Workload Characterization (IISWC), 2013 IEEE International Symposium on
Conference_Location :
Portland, OR
Print_ISBN :
978-1-4799-0553-9
Type :
conf
DOI :
10.1109/IISWC.2013.6704673
Filename :
6704673
Link To Document :
بازگشت