• DocumentCode
    2297772
  • Title

    Benchmarking MapReduce Implementations for Application Usage Scenarios

  • Author

    Fadika, Zacharia ; Dede, Elif ; Govindaraju, Madhusudhan ; Ramakrishnan, Lavanya

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York (SUNY) at Binghamton, Binghamton, NY, USA
  • fYear
    2011
  • fDate
    21-23 Sept. 2011
  • Firstpage
    90
  • Lastpage
    97
  • Abstract
    The MapReduce paradigm provides a scalable model for large scale data-intensive computing and associated fault-tolerance. With data production increasing daily due to ever growing application needs, scientific endeavors, and consumption, the MapReduce model and its implementations need to be further evaluated, improved, and strengthened. Several MapReduce frameworks with various degrees of conformance to the key tenets of the model are available today, each, optimized for specific features. HPC application and middleware developers must thus understand the complex dependencies between MapReduce features and their application. We present a standard benchmark suite for quantifying, comparing, and contrasting the performance of MapReduce platforms under a wide range of representative use cases. We report the performance of three different MapReduce implementations on the benchmarks, and draw conclusions about their current performance characteristics. The three platforms we chose for evaluation are the widely used Apache Hadoop implementation, Twister, which has been discussed in the literature, and LEMO-MR, our own implementation. The performance analysis we perform also throws light on the available design decisions for future implementations, and allows Grid researchers to choose the MapReduce implementation that best suits their application´s needs.
  • Keywords
    benchmark testing; fault tolerant computing; grid computing; middleware; software performance evaluation; Apache Hadoop implementation; HPC application; LEMO-MR; MapReduce features; MapReduce paradigm; MapReduce platforms; Twister; application usage scenarios; associated fault-tolerance; benchmarking MapReduce implementations; complex dependency; current performance characteristics; data production; design decisions; grid researchers; large scale data-intensive computing; middleware developers; performance analysis; representative use cases; scalable model; standard benchmark suite; Benchmark testing; Data processing; Fault tolerance; Fault tolerant systems; Linux; Memory management; Random access memory; Benchmarking; Hadoop; LEMO-MR; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid Computing (GRID), 2011 12th IEEE/ACM International Conference on
  • Conference_Location
    Lyon
  • ISSN
    1550-5510
  • Print_ISBN
    978-1-4577-1904-2
  • Type

    conf

  • DOI
    10.1109/Grid.2011.21
  • Filename
    6076503