مرکز منطقه ای اطلاع رساني علوم و فناوري - Low-Overhead Fault Tolerance for High-Throughput Data Processing Systems

DocumentCode :

2397842

Title :

Low-Overhead Fault Tolerance for High-Throughput Data Processing Systems

Author :

Martin, André ; Knauth, Thomas ; Creutz, Stephan ; Becker, Diogo ; Weigert, Stefan ; Fetzer, Christof ; Brito, Andrey

Author_Institution :

Tech. Univ. Dresden, Dresden, Germany

fYear :

2011

fDate :

20-24 June 2011

Firstpage :

689

Lastpage :

699

Abstract :

The MapReduce programming paradigm proved to be a useful approach for building highly scalable data processing systems. One important reason for its success is simplicity, including the fault tolerance mechanisms. However, this simplicity comes at a price: efficiency. MapReduce´s fault tolerance scheme stores too much intermediate information on disk. This inefficiency negatively affects job completion time. Furthermore, this inefficiency in particular forbids the application of MapReduce in near real-time scenarios where jobs need to produce results quickly. In this paper, we discuss an alternative fault tolerance scheme that is inspired by virtual synchrony. The key feature of our approach is a low-overhead deterministic execution. Deterministic execution reduces the amount of persistently stored information. In addition, because persisting intermediate results are no longer required for fault tolerance, we use more efficient communication techniques that considerably improve job completion time and throughput. Our contribution is twofold: (i) we enable the use of MapReduce for jobs ranging from seconds to a few tens of seconds, satisfying these deadlines even in the case of failures, (ii) we considerably reduce the fault tolerance overhead and as such the overhead of MapReduce in general. Our modifications are transparent to the application.

Keywords :

data handling; deterministic algorithms; fault tolerance; parallel programming; MapReduce programming; communication technique; fault tolerance mechanism; high throughput data processing system; low overhead deterministic execution; low overhead fault tolerance; real time scenario; virtual synchrony; Aggregates; Computer crashes; Fault tolerance; Fault tolerant systems; Monitoring; Programming; Synchronization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Distributed Computing Systems (ICDCS), 2011 31st International Conference on

Conference_Location :

Minneapolis, MN

ISSN :

1063-6927

Print_ISBN :

978-1-61284-384-1

Electronic_ISBN :

1063-6927

Type :

conf

DOI :

10.1109/ICDCS.2011.29

Filename :

5961745

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2397842