DocumentCode :
652254
Title :
DataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds
Author :
Tudoran, Radu ; Costan, Alexandru ; Antoniu, Gabriel
Author_Institution :
INRIA Rennes - Bretagne Atlantique, Bretagne Atlantiques, France
fYear :
2013
fDate :
16-18 July 2013
Firstpage :
1057
Lastpage :
1064
Abstract :
A large spectrum of scientific applications, some generating data volumes exceeding petabytes, are currently being ported on clouds to build on their inherent elasticity and scalability. One of the critical needs in order to deal with this "data deluge" is an efficient, scalable and reliable storage. However, the storage services proposed by cloud providers suffer from high latencies, trading performance for availability. One alternative is to federate the local virtual disks on the compute nodes into a globally shared storage used for large intermediate or checkpoint data. This collocated storage supports a high throughput but it can be very intrusive and subject to failures that can stop the host node and degrade the application performance. To deal with these limitations we propose DataSteward, a data management system that provides a higher degree of reliability while remaining non-intrusive through the use of dedicated compute nodes. DataSteward harnesses the storage space of a set of dedicated VMs, selected using a topology-aware clustering algorithm, and has a lifetime dependent on the deployment lifetime. To capitalize on this separation, we introduce a set of scientific data processing services on top of the storage layer, that can overlap with the executing applications. We performed extensive experimentations on hundreds of cores in the Azure cloud: compared to state-of-the-art node selection algorithms, we show up to a 20% higher throughput, which improves the overall performance of a real life scientific application up to 45%.
Keywords :
cloud computing; data handling; natural sciences computing; pattern clustering; storage management; virtual machines; Azure cloud; DataSteward; VM; checkpoint data; collocated storage; data deluge; data management system; dedicated compute nodes; deployment lifetime; globally shared storage; intermediate data; local virtual disk federation; node selection algorithm; public clouds; scientific applications; scientific data processing services; storage services; topology-aware clustering algorithm; Cloud computing; Clustering algorithms; Data processing; Distributed databases; Reliability; Servers; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Type :
conf
DOI :
10.1109/TrustCom.2013.129
Filename :
6680949
Link To Document :
بازگشت