مرکز منطقه ای اطلاع رساني علوم و فناوري - A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

DocumentCode :

3678337

Title :

A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

Author :

Dixin Tang;Taoying Liu;Rubao Lee;Hong Liu;Wei Li

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Current major big data analytical stacks often consist of a general purpose, multi-staged computation framework (e.g. Hadoop) and an SQL query system (e.g. Hive) on its top. A key factor of query performance is the efficiency of data shuffling between two execution stages (e.g. Map/Reduce). In current data shuffling, various useful information about the shuffled data and the query on the data is simply wasted. In this paper, we make a strong case of cross-layer optimizations for Hive/Hadoop stack: we have designed and implemented a novel data shuffling mechanism in Hadoop, called Structured Data Shuffling (S-Shuffle), which carefully leverages the rich information in data and queries to optimize the overall query processing. Our experimental results with industry-standard TPC-H benchmark show that, by using S-Shuffle, the performance of SQL query processing on Hadoop can be improved by up to 2.4x.

Keywords :

"Merging","Sorting","Optimization","Data mining","Yttrium","Big data","Compression algorithms"

Publisher :

ieee

Conference_Titel :

Cluster Computing (CLUSTER), 2015 IEEE International Conference on

Type :

conf

DOI :

10.1109/CLUSTER.2015.19

Filename :

7307566

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3678337