DocumentCode
172891
Title
Improving Hadoop Service Provisioning in a Geographically Distributed Cloud
Author
Qi Zhang ; Ling Liu ; Kisung Lee ; Yang Zhou ; Singh, Ashutosh ; Mandagere, Nagapramod ; Gopisetty, Sandeep ; Alatorre, Gabriel
fYear
2014
fDate
June 27 2014-July 2 2014
Firstpage
432
Lastpage
439
Abstract
With more data generated and collected in a geographically distributed manner, combined by the increased computational requirements for large scale data-intensive analysis, we have witnessed the growing demand for geographically distributed Cloud datacenters and hybrid Cloud service provisioning, enabling organizations to support instantaneous demand of additional computational resources and to expand inhouse resources to maintain peak service demands by utilizing cloud resources. A key challenge for running applications in such a geographically distributed computing environment is how to efficiently schedule and perform analysis over data that is geographically distributed across multiple datacenters. In this paper, we first compare multi-datacenter Hadoop deployment with single-datacenter Hadoop deployment to identify the performance issues inherent in a geographically distributed cloud. A generalization of the problem characterization in the context of geographically distributed cloud datacenters is also provided with discussions on general optimization strategies. Then we describe the design and implementation of a suite of system-level optimizations for improving performance of Hadoop service provisioning in a geo-distributed cloud, including prediction-based job localization, configurable HDFS data placement, and data prefetching. Our experimental evaluation shows that our prediction based localization has very low error ratio, smaller than 5%, and our optimization can improve the execution time of Reduce phase by 48.6%.
Keywords
cloud computing; parallel programming; resource allocation; Hadoop service provisioning; cloud resource utilization; configurable HDFS data placement; data intensive analysis; data prefetching; geographically distributed cloud data center; hybrid cloud service provisioning; prediction based localization; prediction-based job localization; Cloud computing; Distributed databases; Optimization; Predictive models; Schedules; Virtualization; Cross-cloud Hadoop deployment; Geographically distributed cloud; Hybrid cloud; Performance optimizaiton;
fLanguage
English
Publisher
ieee
Conference_Titel
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location
Anchorage, AK
Print_ISBN
978-1-4799-5062-1
Type
conf
DOI
10.1109/CLOUD.2014.65
Filename
6973771
Link To Document