Title :
ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing
Author :
Jin, Hui ; Yang, Xi ; Sun, Xian-He ; Raicu, Ioan
Author_Institution :
Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
Abstract :
The MapReduce programming paradigm is gaining more and more popularity recently due to its merits of ease of programming, data distribution and fault tolerance. The low barrier of adoption of MapReduce makes it a promising framework for non-dedicated distributed computing environments. However, the variability of hosts resources and availability could substantially degrade the performance of MapReduce applications. The replication-based fault tolerance mechanism helps to alleviate some problems at the cost of inefficient storage space utilization. Intelligent solutions that guarantee the performance of MapReduce applications with low data replication degree are needed to promote the idea of running MapReduce applications in non-dedicated environment at lower costs. In this research, we propose an Availability-aware Data Placement (ADAPT) strategy to improve the application performance without extra storage cost. The basic idea of ADAPT is to dispatch data based on the availability of each node, reduce network traffic, improve data locality, and optimize the application performance. We implement the prototype of ADAPT within the Hadoop framework, an open-source implementation of MapReduce. The performance of ADAPT is evaluated in an emulated non-dedicated distributed environment. The experimental results show that ADAPT can improve the performance by more than 30%. ADAPT achieves high reliability without the need for additional data replication. ADAPT has also been evaluated for large-scale computing environment through simulations, with promising results.
Keywords :
fault tolerant computing; parallel programming; ADAPT strategy; Hadoop framework; MapReduce adoption; MapReduce programming paradigm; availability-aware MapReduce data placement; data distribution; data locality; data replication degree; fault tolerance; large-scale computing environment; network traffic reduction; node availability; nondedicated distributed computing; open-source implementation; replication-based fault tolerance mechanism; resource availability; resource variability; storage cost; storage space utilization; Adaptation models; Availability; Computational modeling; Data models; Distributed databases; Interrupters; MapReduce; Performance; Reliability;
Conference_Titel :
Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on
Conference_Location :
Macau
Print_ISBN :
978-1-4577-0295-2
DOI :
10.1109/ICDCS.2012.48