DocumentCode :
2262088
Title :
Cost-aware replication for dataflows
Author :
Castillo, Claris ; Tantawi, Asser N. ; Arroyo, Diana ; Steinder, Malgorzata
Author_Institution :
IBM T. J. Watson Res. Center, Hawthorne, NY, USA
fYear :
2012
fDate :
16-20 April 2012
Firstpage :
171
Lastpage :
178
Abstract :
In this work we are concerned with the cost associated with replicating intermediate data for dataflows in Cloud environments. This cost is attributed to the extra resources required to create and maintain the additional replicas for a given data set. Existing data-analytic platforms such as Hadoop provide for fault-tolerance guarantee by relying on aggressive replication of intermediate data. We argue that the decision to replicate along with the number of replicas should be a function of the resource usage and utility of the data in order to minimize the cost of reliability. Furthermore, the utility of the data is determined by the structure of the dataflow and the reliability of the system. We propose a replication technique, which takes into account resource usage, system reliability and the characteristic of the dataflow to decide what data to replicate and when to replicate. The replication decision is obtained by solving a constrained integer programming problem given information about the dataflow up to a decision point. In addition, we built a working prototype, CARDIO of our technique which shows through experimental evaluation using a real testbed that finds an optimal solution.
Keywords :
cloud computing; cost reduction; data analysis; data flow analysis; fault tolerant computing; resource allocation; software reliability; Hadoop; cloud environments; constrained integer programming problem; cost aware CARDIO; cost aware replication; data analysis; data utility; dataflow structure; decision point; fault tolerance; intermediate data replication; replication decision; resource usage; system reliability cost minimization; Availability; Degradation; Fault tolerance; Measurement; Optimization; Silicon; Hadoop; data-availability; dataflows; map-reduce; replication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Operations and Management Symposium (NOMS), 2012 IEEE
Conference_Location :
Maui, HI
ISSN :
1542-1201
Print_ISBN :
978-1-4673-0267-8
Electronic_ISBN :
1542-1201
Type :
conf
DOI :
10.1109/NOMS.2012.6211896
Filename :
6211896
Link To Document :
بازگشت