DocumentCode
2844948
Title
Towards efficient resource management for data-analytic platforms
Author
Castillo, Claris ; Spreitzer, Mike ; Steinder, Malgorzata
Author_Institution
T.J. Watson Res. Center, IBM, Hawthorne, NY, USA
fYear
2011
fDate
23-27 May 2011
Firstpage
73
Lastpage
80
Abstract
We present architectural and experimental work exploring the role of intermediate data handling in the performance of MapReduce workloads. Our findings show that: (a) certain jobs are more sensitive to disk cache size than others and (b) this sensitivity is mostly due to the local file I/O for the intermediate data. We also show that a small amount of memory is sufficient for the normal needs of map workers to hold their intermediate data until it is read. We introduce Hannibal, which exploits the modesty of that need in a simple and direct way - holding the intermediate data in application-level memory for precisely the needed time - to improve performance when the disk cache is stressed. We have implemented Hannibal and show through experimental evaluation that Hannibal can make MapReduce jobs run faster than Hadoop when little memory is available to the disk cache. This provides better performance insulation between concurrent jobs.
Keywords
cache storage; data handling; middleware; public domain software; Hadoop; Hannibal; MapReduce; MapReduce workloads; application-level memory; data-analytic platforms; disk cache size; intermediate data handling; open source middleware; resource management; Context; Insulation; Monitoring; Reliability; Resource management; Hadoop; Map-Reduce; disk; performance;
fLanguage
English
Publisher
ieee
Conference_Titel
Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on
Conference_Location
Dublin
Print_ISBN
978-1-4244-9219-0
Electronic_ISBN
978-1-4244-9220-6
Type
conf
DOI
10.1109/INM.2011.5990676
Filename
5990676
Link To Document