• DocumentCode
    1999662
  • Title

    HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL

  • Author

    Grossman, Max ; Breternitz, Mauricio ; Sarkar, Vivek

  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    1918
  • Lastpage
    1927
  • Abstract
    As the scale of high performance computing systems grows, three main challenges arise: the programmability, reliability, and energy efficiency of those systems. Accomplishing all three without sacrificing performance requires a rethinking of legacy distributed programming models and homogeneous clusters. In this work, we integrate Hadoop MapReduce with OpenCL to enable the use of heterogeneous processors in a distributed system. We do this by exploiting the implicit data parallelism of mappers and reducers in a MapReduce system. Combining Hadoop and OpenCL provides 1) an easy-to-learn and flexible application programming interface in a high level and popular programming language, 2) the reliability guarantees and distributed file system of Hadoop, and 3) the low power consumption and performance acceleration of heterogeneous processors. This paper presents HadoopCL: an extension to Hadoop which supports execution of user-written Java kernels on heterogeneous devices, optimizes communication through asynchronous transfers and dedicated I/O threads, automatically generates OpenCL kernels from Java byte code using the open source tool APARAPI, and achieves nearly 3x overall speedup and better than 55x speedup of the computational sections for example MapReduce applications, relative to Hadoop.
  • Keywords
    application program interfaces; programming languages; Hadoop MapReduce system; HadoopCL; Java byte code; MapReduce applications; OpenCL kernels; computational sections; data parallelism; distributed file system; distributed heterogeneous platforms; distributed system; energy efficiency; flexible application programming interface; heterogeneous devices; heterogeneous processors; high performance computing systems; homogeneous clusters; legacy distributed programming models; open source tool APARAPI; programmability; programming language; reliability; seamless integration; user written Java kernels; Complexity theory; Graphics processing units; Instruction sets; Java; Kernel; Programming; Reliability; GPGPU; Hadoop; OpenCL; heterogeneous; multicore;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    978-0-7695-4979-8
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2013.246
  • Filename
    6651095