مرکز منطقه ای اطلاع رساني علوم و فناوري - Supporting fault tolerance in a data-intensive computing middleware

DocumentCode :

2441773

Title :

Supporting fault tolerance in a data-intensive computing middleware

Author :

Bicer, Tekin ; Jiang, Wei ; Agrawal, Gagan

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2010

fDate :

19-23 April 2010

Firstpage :

Lastpage :

Abstract :

Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developing this class of applications. Besides programmability and ease of parallelization, fault tolerance is clearly important for data-intensive applications, because of their long running nature, and because of the potential for using a large number of nodes for processing massive amounts of data. Fault-tolerance has been an important attribute of map-reduce as well in its Hadoop implementation, where it is based on replication of data in the file system. Two important goals in supporting fault-tolerance are low overheads and efficient recovery. With these goals, this paper describes a different approach for enabling data-intensive computing with fault-tolerance. Our approach is based on an API for developing data-intensive computations that is a variation of map-reduce, and it involves an explicit programmer-declared reduction object. We show how more efficient fault-tolerance support can be developed using this API. Particularly, as the reduction object represents the state of the computation on a node, we can periodically cache the reduction object from every node at another location and use it to support failure-recovery. We have extensively evaluated our approach using two data-intensive applications. Our results show that the overheads of our scheme are extremely low, and our system outperforms Hadoop both in absence and presence of failures.

Keywords :

fault tolerant computing; middleware; system recovery; API; Hadoop implementation; data-intensive computations; data-intensive computing middleware; failure-recovery; fault tolerance; map-reduce; programmability; programmer-declared reduction object; Cloud computing; Computer science; Data analysis; Data engineering; Fault tolerance; File systems; High performance computing; Image analysis; Large-scale systems; Middleware; Cloud computing; Data-intensive computing; Fault tolerance; Map-Reduce;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on

Conference_Location :

Atlanta, GA

ISSN :

1530-2075

Print_ISBN :

978-1-4244-6442-5

Type :

conf

DOI :

10.1109/IPDPS.2010.5470462

Filename :

5470462

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2441773