• DocumentCode
    2844948
  • Title

    Towards efficient resource management for data-analytic platforms

  • Author

    Castillo, Claris ; Spreitzer, Mike ; Steinder, Malgorzata

  • Author_Institution
    T.J. Watson Res. Center, IBM, Hawthorne, NY, USA
  • fYear
    2011
  • fDate
    23-27 May 2011
  • Firstpage
    73
  • Lastpage
    80
  • Abstract
    We present architectural and experimental work exploring the role of intermediate data handling in the performance of MapReduce workloads. Our findings show that: (a) certain jobs are more sensitive to disk cache size than others and (b) this sensitivity is mostly due to the local file I/O for the intermediate data. We also show that a small amount of memory is sufficient for the normal needs of map workers to hold their intermediate data until it is read. We introduce Hannibal, which exploits the modesty of that need in a simple and direct way - holding the intermediate data in application-level memory for precisely the needed time - to improve performance when the disk cache is stressed. We have implemented Hannibal and show through experimental evaluation that Hannibal can make MapReduce jobs run faster than Hadoop when little memory is available to the disk cache. This provides better performance insulation between concurrent jobs.
  • Keywords
    cache storage; data handling; middleware; public domain software; Hadoop; Hannibal; MapReduce; MapReduce workloads; application-level memory; data-analytic platforms; disk cache size; intermediate data handling; open source middleware; resource management; Context; Insulation; Monitoring; Reliability; Resource management; Hadoop; Map-Reduce; disk; performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on
  • Conference_Location
    Dublin
  • Print_ISBN
    978-1-4244-9219-0
  • Electronic_ISBN
    978-1-4244-9220-6
  • Type

    conf

  • DOI
    10.1109/INM.2011.5990676
  • Filename
    5990676