• DocumentCode
    74855
  • Title

    Extending MapReduce across Clouds with BStream

  • Author

    Kailasam, Sriram ; Dhawalia, Prateek ; Balaji, S.J. ; Iyer, Gopalakrishnan ; Dharanipragada, Janakiram

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Madras, Chennai, India
  • Volume
    2
  • Issue
    3
  • fYear
    2014
  • fDate
    July-Sept. 1 2014
  • Firstpage
    362
  • Lastpage
    376
  • Abstract
    Today, batch processing frameworks like Hadoop MapReduce are difficult to scale to multiple clouds due to latencies involved in inter-cloud data transfer and synchronization overheads during shuffle-phase. This inhibits the MapReduce framework from guaranteeing performance at variable load surges without over-provisioning in the internal cloud (IC). We propose BStream, a cloud bursting framework for MapReduce that couples stream-processing in the external cloud (EC) with Hadoop in the internal cloud (IC). Stream processing in EC enables pipelined uploading, processing and downloading of data to minimize network latencies. We use this framework to meet job deadlines. BStream uses an analytical model to minimize the usage of EC. We propose different checkpointing strategies that overlap output transfer with input transfer/processing and simultaneously reduce the computation involved in merging the results from EC and IC. Checkpointing further reduces job completion time. We experimentally compare BStream with other related works and illustrate performance benefits due to stream processing and checkpointing strategies in EC. Lastly, we characterize the operational regime of BStream.
  • Keywords
    batch processing (computers); checkpointing; cloud computing; data handling; parallel programming; public domain software; synchronisation; BStream; Hadoop MapReduce; batch processing frameworks; checkpointing strategies; cloud bursting framework; computation reduction; data downloading; data processing; data uploading; intercloud data transfer; internal cloud; job completion time reduction; multiple clouds; shuffle-phase; stream processing; synchronization overheads; variable load surges; Analytical models; Batch production systems; Data transfer; Delays; Integrated circuits; Peer-to-peer computing; Storms; MapReduce; inter-cloud; stream processing;
  • fLanguage
    English
  • Journal_Title
    Cloud Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-7161
  • Type

    jour

  • DOI
    10.1109/TCC.2014.2316810
  • Filename
    6786985