DocumentCode :
3092645
Title :
Data replication strategies in grid environments
Author :
Lamehamedi, Houda ; Szymanski, Boleslaw ; Shentu, Zujun ; Deelman, Ewa
Author_Institution :
Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY, USA
fYear :
2002
fDate :
23-25 Oct. 2002
Firstpage :
378
Lastpage :
383
Abstract :
Data grids provide geographically distributed resources for large-scale data-intensive applications that generate large data sets. However, ensuring efficient and fast access to such huge and widely distributed data is hindered by the high latencies of the Internet. To address these problems we introduce a set of replication management services and protocols that offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Replication decisions are made based on a cost model that evaluates data access costs and performance gains of creating each replica. The estimation of costs and gains is based on factors such as run-time accumulated read/write statistics, response time, bandwidth, and replica size. To address scalability, replicas are organized in a combination of hierarchical and flat topologies that represent propagation graphs that minimize inter-replica communication costs. To evaluate our model we use the network simulator NS to study the impact of replication. Our results prove that replication improves the performance of data access on the data grid, and that the gain increases with the size of data used.
Keywords :
fault tolerant computing; metacomputing; replicated databases; virtual machines; Internet; bandwidth; cost model; data access costs; data grids; data replication strategies; fault tolerance; flat topologies; geographically distributed resources; hierarchical topologies; high data availability; interreplica communication costs; large data sets; large-scale data-intensive applications; low bandwidth consumption; network simulator; performance gains; propagation graphs; protocols; replica size; replication decisions; replication management services; response time; run-time accumulated read/write statistics; scalability; Access protocols; Availability; Bandwidth; Costs; Delay; Internet; Large-scale systems; Mesh generation; Performance gain; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Algorithms and Architectures for Parallel Processing, 2002. Proceedings. Fifth International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7695-1512-6
Type :
conf
DOI :
10.1109/ICAPP.2002.1173605
Filename :
1173605
Link To Document :
بازگشت