• DocumentCode
    3537728
  • Title

    Assessing MapReduce for Internet Computing: A Comparison of Hadoop and BitDew-MapReduce

  • Author

    Lu, Lu ; Jin, Hai ; Shi, Xuanhua ; Fedak, Gilles

  • Author_Institution
    Cluster & Grid Comput. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • fYear
    2012
  • fDate
    20-23 Sept. 2012
  • Firstpage
    76
  • Lastpage
    84
  • Abstract
    MapReduce is emerging as an important programming model for data-intensive application. Adapting this model to desktop grid would allow taking advantage of the vast amount of computing power and distributed storage to execute new range of application able to process enormous amount of data. In 2010, we have presented the first implementation of MapReduce dedicated to Internet Desktop Grid based on the BitDew middleware. In this paper, we present new optimizations to BitDew-MapReduce (BitDew-MR): aggressive task backup, intermediate result backup, task re-execution mitigation and network failure hiding. We propose a new experimental framework which emulates key fundamental aspects of Internet Desktop Grid. Using the framework, we compare BitDew-MR and the open-source Hadoop middleware on Grid5000. Our experimental results show that 1) BitDew-MR successfully passes all the stress-tests of the framework while Hadoop is unable to work in typical wide-area network topology which includes PC hidden behind firewall and NAT; 2) BitDew-MR outperforms Hadoop performances on several aspects: scalability, fairness, resilience to node failures, and network disconnections.
  • Keywords
    Internet; grid computing; middleware; optimisation; public domain software; BitDew middleware; BitDew-MR; BitDew-MapReduce; Grid5000; Internet computing; Internet desktop grid; aggressive task backup; data-intensive application; desktop grid; intermediate result backup; network disconnections; network failure hiding; open-source Hadoop middleware; optimizations; task reexecution mitigation; Data processing; Distributed databases; Dynamic scheduling; Heart beat; Internet; Peer to peer computing; Runtime; MapReduce; cloud computing; data-intensive application; desktop grid computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid Computing (GRID), 2012 ACM/IEEE 13th International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1550-5510
  • Print_ISBN
    978-1-4673-2901-9
  • Type

    conf

  • DOI
    10.1109/Grid.2012.31
  • Filename
    6319157