مرکز منطقه ای اطلاع رساني علوم و فناوري - DRAW: A new Data-gRouping-AWare data placement scheme for data intensive applications with interest locality

DocumentCode :

589530

Title :

DRAW: A new Data-gRouping-AWare data placement scheme for data intensive applications with interest locality

Author :

Pengju Shang ; Qiangju Xiao ; Jun Wang

Author_Institution :

Univ. of Central Florida, Orlando, FL, USA

fYear :

2012

fDate :

Oct. 31 2012-Nov. 2 2012

Firstpage :

Lastpage :

Abstract :

Recent years have seen an increasing number of scientists employ data parallel computing frameworks such as MapReduce and Hadoop to run data intensive applications and conduct analysis. In these co-located compute and storage frameworks, a wise data placement scheme can significantly improve the performance. Existing data parallel frameworks, e.g. Hadoop, or Hadoop-based clouds, distribute the data using a random placement method for simplicity and load balance. However, we observe that many data intensive applications exhibit interest locality which only sweep part of a big data set. The data often accessed together result from their grouping semantics. Without taking data grouping into consideration, the random placement does not perform well and is way below the efficiency of optimal data distribution. In this paper, we develop a new Data-gRouping-AWare (DRAW) data placement scheme to address the above-mentioned problem. DRAW dynamically scrutinizes data access from system log files. It extracts optimal data groupings and re-organizes data layouts to achieve the maximum parallelism per group subjective to load balance. By experimenting two real-world MapReduce applications with different data placement schemes on a 40-node test bed, we conclude that DRAW increases the total number of local map tasks executed up to 59:8%, reduces the completion latency of the map phase up to 41:7%, and improves the overall performance by 36:4%, in comparison with Hadoop´s default random placement.

Keywords :

cloud computing; data handling; parallel processing; Data-gRouping-AWare data placement scheme; Hadoop-based clouds; MapReduce; data groupings; data layouts; data parallel computing frameworks; data parallel frameworks; grouping semantics; wise data placement scheme; Abstracts; Bioinformatics; Genomics; Indexes; Random access memory; Synthetic aperture sonar; Data layout; Data-intensive; Hadoop; MapReduce;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

APMRC, 2012 Digest

Conference_Location :

Singapore

Print_ISBN :

978-1-4673-4734-1

Type :

conf

Filename :

6407372

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=589530