Title :
Locality Based Data Partitioning in MapReduce
Author :
Chunguang Wang ; Qingbo Wu ; Yusong Tan ; Wenzhu Wang ; Quanyuan Wu
Author_Institution :
Coll. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Performance of MapReduce heavily depends on its data partitions for parallelism controlling, however, current state of art solutions are implemented using naive methods which are far from optimal. In this paper, we outline our solution, a locality based and skew aware partition technology, intend to address the data partition problems which encumber the job performance seriously. Our solution LBP(Locality Based Partitioning) clusters data blocks from a same node into a single partition, which need only one map task to process avoiding the spoil time for slot reallocation and multiple tasks initializing. Realizing the data skew problem, we enhance the LBP to LBP-SA(LBP Skew Aware) to partition the data file according their record and computation skews, so decrease the variety of tasks lifetime. Experiments results demonstrated that our solutions can improve the MapReduce processing performance remarkably than traditional Hadoop Implementation.
Keywords :
data handling; parallel processing; pattern clustering; Hadoop Implementation; LBP skew aware; LBP-SA; MapReduce; locality based data partitioning; locality based partition technology; naive methods; parallelism control; skew aware partition technology; slot reallocation; Arrays; Complexity theory; Navigation; Parallel processing; Partitioning algorithms; Peer-to-peer computing; Tin; Data partition; Data Skew; MapReduce; Locality Based Partitioning;
Conference_Titel :
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location :
Sydney, NSW
DOI :
10.1109/CSE.2013.194