DocumentCode
244520
Title
NEWT - A resilient BSP framework for Iterative algorithms on hadoop YARN
Author
Kromonov, Ilja ; Jakovits, P. ; Srirama, Satish Narayana
Author_Institution
Inst. of Comput. Sci., Univ. of Tartu, Tartu, Estonia
fYear
2014
fDate
21-25 July 2014
Firstpage
251
Lastpage
259
Abstract
The importance of fault tolerance for parallel computing is ever increasing. The mean time between failures (MTBF) is predicted to decrease significantly for future highly parallel systems. At the same time, the current trend to use commodity hardware to reduce the cost of clusters puts pressure on users to ensure fault tolerance of their applications. Cloud-based resources are one of the environments where the latter holds true. When it comes to embarrassingly parallel data-intensive algorithms, MapReduce has gone a long way in ensuring users can easily utilize these resources without the fear of losing work. However, this does not apply to iterative communication-intensive algorithms common in the scientific computing domain. In this work we propose a new programming model inspired by Bulk Synchronous Parallel (BSP), for creating a new fault tolerant distributed computing framework. We strive to retain the advantages that MapReduce provides, yet efficiently support a larger assortment of algorithms, such as the aforementioned iterative ones. The model adopts an approach similar to continuation passing for implementing parallel algorithms and facilitates fault tolerance inherent in the BSP program structure. Based on the model we created a distributed computing framework - NEWT, which we describe and use to validate the approach.
Keywords
fault tolerant computing; iterative methods; parallel algorithms; parallel programming; Hadoop YARN; MTBF; MapReduce; NEWT framework; bulk synchronous parallel programming; cloud-based resources; fault tolerance; fault tolerant distributed computing framework; iterative algorithms; iterative communication-intensive algorithms; mean time between failure; parallel algorithms; parallel computing; parallel data-intensive algorithms; resilient BSP framework; Adaptation models; Computational modeling; Fault tolerance; Fault tolerant systems; Iterative methods; Programming; Synchronization; Bulk Synchronous Parallel; Hadoop YARN; cloud computing; fault tolerance; iterative algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing & Simulation (HPCS), 2014 International Conference on
Conference_Location
Bologna
Print_ISBN
978-1-4799-5312-7
Type
conf
DOI
10.1109/HPCSim.2014.6903693
Filename
6903693
Link To Document