Title :
Coordinated Resource Management for Large Scale Interactive Data Query Systems
Author :
Wei Yan ; Yuan Xue
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Vanderbilt Univ. Nashville, Nashville, TN, USA
Abstract :
Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.
Keywords :
SQL; data analysis; interactive systems; optimisation; parallel processing; pattern clustering; public domain software; query processing; FIFO resource management mechanism; SQL-like query; TPC-DS workload; clustering environment; concurrent interactive query; coordinated multiple resource management problem; coordinated resource management; coordinated resource management solution; coordinated tasks; data retrieval; interactive ad hoc data query; large scale interactive data query systems; massive datasets; massively parallel data query; open source Impala system; price-based algorithm; simple fair resource share mechanism; utility-based optimization framework; Aggregates; Clustering algorithms; Distributed databases; Memory management; Optimization; Parallel processing; Resource management; fairness; interactive query; price/utlity;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
DOI :
10.1109/CCGrid.2015.149