DocumentCode
3078069
Title
Coordinated Resource Management for Large Scale Interactive Data Query Systems
Author
Wei Yan ; Yuan Xue
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Vanderbilt Univ. Nashville, Nashville, TN, USA
fYear
2015
fDate
4-7 May 2015
Firstpage
677
Lastpage
686
Abstract
Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.
Keywords
SQL; data analysis; interactive systems; optimisation; parallel processing; pattern clustering; public domain software; query processing; FIFO resource management mechanism; SQL-like query; TPC-DS workload; clustering environment; concurrent interactive query; coordinated multiple resource management problem; coordinated resource management; coordinated resource management solution; coordinated tasks; data retrieval; interactive ad hoc data query; large scale interactive data query systems; massive datasets; massively parallel data query; open source Impala system; price-based algorithm; simple fair resource share mechanism; utility-based optimization framework; Aggregates; Clustering algorithms; Distributed databases; Memory management; Optimization; Parallel processing; Resource management; fairness; interactive query; price/utlity;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location
Shenzhen
Type
conf
DOI
10.1109/CCGrid.2015.149
Filename
7152533
Link To Document