DocumentCode
587612
Title
A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns
Author
Abad, Cristina L. ; Roberts, Nick ; Yi Lu ; Campbell, Roy H.
Author_Institution
Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear
2012
fDate
4-6 Nov. 2012
Firstpage
100
Lastpage
109
Abstract
A huge increase in data storage and processing requirements has lead to Big Data, for which next generation storage systems are being designed and implemented. However, we have a limited understanding of the workloads of Big Data storage systems. We consider the case of one common type of Big Data storage cluster: a cluster dedicated to supporting a mix of MapReduce jobs. We analyze 6-month traces from two large Hadoop clusters at Yahoo! and characterize the file popularity, temporal locality, and arrival patterns of the workloads. We identify several interesting properties and compare them with previous observations from web and media server workloads. To the best of our knowledge, this is the first study of how MapReduce workloads interact with the storage layer.
Keywords
Internet; data analysis; file servers; parallel programming; pattern clustering; storage management; Hadoop cluster; MapReduce workload; Web server workload; Yahoo!; arrival patterns; big data storage cluster; data processing requirement; data storage centric analysis; file popularity; media server workload; next generation storage system; temporal locality; Data handling; Data storage systems; Distribution functions; Information management; Media; Servers; Sociology; Big Data; HDFS; MapReduce; access patterns;
fLanguage
English
Publisher
ieee
Conference_Titel
Workload Characterization (IISWC), 2012 IEEE International Symposium on
Conference_Location
La Jolla, CA
Print_ISBN
978-1-4673-4531-6
Type
conf
DOI
10.1109/IISWC.2012.6402909
Filename
6402909
Link To Document