Coordinated Resource Management for Large Scale Interactive Data Query Systems

Author

Wei Yan ; Yuan Xue

Author_Institution

Dept. of Electr. Eng. & Comput. Sci., Vanderbilt Univ. Nashville, Nashville, TN, USA

fYear

2015

fDate

4-7 May 2015

Firstpage

677

Lastpage

686

Abstract

Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.

Keywords

SQL; data analysis; interactive systems; optimisation; parallel processing; pattern clustering; public domain software; query processing; FIFO resource management mechanism; SQL-like query; TPC-DS workload; clustering environment; concurrent interactive query; coordinated multiple resource management problem; coordinated resource management; coordinated resource management solution; coordinated tasks; data retrieval; interactive ad hoc data query; large scale interactive data query systems; massive datasets; massively parallel data query; open source Impala system; price-based algorithm; simple fair resource share mechanism; utility-based optimization framework; Aggregates; Clustering algorithms; Distributed databases; Memory management; Optimization; Parallel processing; Resource management; fairness; interactive query; price/utlity;

fLanguage

English

Publisher

ieee

Conference_Titel

Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on

Conference_Location

Shenzhen

Type

conf

DOI

10.1109/CCGrid.2015.149

Filename

7152533