• DocumentCode
    74459
  • Title

    Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration

  • Author

    Weikuan Yu ; Yandong Wang ; Xinyu Que

  • Author_Institution
    Dept. of Comput. Sci. & Software Eng., Auburn Univ., Auburn, AL, USA
  • Volume
    25
  • Issue
    3
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    602
  • Lastpage
    611
  • Abstract
    Hadoop is a popular open source implementation of the MapReduce programming model for cloud computing. However, it faces a number of issues to achieve the best performance from the underlying systems. These include a serialization barrier that delays the reduce phase, repetitive merges, and disk accesses, and the lack of portability to different interconnects. To keep up with the increasing volume of data sets, Hadoop also requires efficient I/O capability from the underlying computer systems to process and analyze data. We describe Hadoop-A, an acceleration framework that optimizes Hadoop with plug-in components for fast data movement, overcoming the existing limitations. A novel network-levitated merge algorithm is introduced to merge data without repetition and disk access. In addition, a full pipeline is designed to overlap the shuffle, merge, and reduce phases. Our experimental results show that Hadoop-A significantly speeds up data movement in MapReduce and doubles the throughput of Hadoop. In addition, Hadoop-A significantly reduces disk accesses caused by intermediate data.
  • Keywords
    cloud computing; input-output programs; merging; parallel programming; Hadoop-A acceleration framework; MapReduce programming model; cloud computing; data merging; data movement; input-output capability; interconnects; merge phase; network-levitated merge algorithm; reduce phase; shuffle phase; Acceleration; Algorithm design and analysis; Data processing; IP networks; Pipelines; Protocols; Servers; Hadoop; Hadoop acceleration; MapReduce; cloud computing; network-levitated merge;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.59
  • Filename
    6471971