DocumentCode
3600011
Title
Multi-Q: Multiple Queries Optimization Based on MapReduce in Cloud
Author
Ding Ding ; Fang Dong ; Junzhou Luo
Author_Institution
Sch. of Comput. Sci. & Eng., Southeast Univ., Nanjing, China
fYear
2014
Firstpage
100
Lastpage
107
Abstract
With the explosion of data in the past decade, big data is becoming a research hotspot in the information field. Many cloud-based distributed data processing platforms have been proposed to provide efficient and cost effective solutions for big data query processing, such as Hadoop, Hive, Pig, etc. However, most of the current research works are focus on improving the performance of query processing based on the view of systematics while without considering the characteristics of queries themselves, such as the query similarity, which will cause large numbers of redundant computation, effect query execution efficiency, thus having an adverse impact on promotion of the multi-queries processing performance. To solve this problem, in this paper, we propose a Multi-queries optimization framework based on MapReduce-oriented cloud environment (Multi-Q), which utilizes the dependence between multiple queries to realize query results reuse. Firstly, a cluster-based partition algorithm called CPA has been exploited to conduct the logic partition of the search range of query workload. Secondly, a multi-queries reuse dependence graph (MRDG) construction method on the basis of the cluster-based partition results has been presented to depict the dependence between the multiple queries. Finally, a Multi-Q processing algorithm based on Multi-Q Reuse Dependence Graph has been put forward to achieve the query results reuse and improve the overall query processing performance. We evaluate our approach by deploying Multi-Q based on Hadoop in a real cloud environment, called SEU-Cloud, and conducting extensive experiments based on the standard TPC-H. The result verifies that compared with Hive, the performance of improvement is approximately 39.3% by using our Multi-Q.
Keywords
cloud computing; data handling; parallel processing; pattern clustering; query processing; CPA; Hadoop; Hive; MRDG construction method; MapReduce-oriented cloud environment; SEU-Cloud; TPC-H; cloud environment; cluster-based partition algorithm; logic partition; multiQ processing algorithm; multiple queries optimization; multiqueries optimization framework; multiqueries reuse dependence graph; query processing performance; Algorithm design and analysis; Big data; Clustering algorithms; Heuristic algorithms; Partitioning algorithms; Query processing; Uplink; Big data; Cloud Computing; Multi-queries Result Reuse; Query Optimization;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Cloud and Big Data (CBD), 2014 Second International Conference on
Print_ISBN
978-1-4799-8086-4
Type
conf
DOI
10.1109/CBD.2014.20
Filename
7176078
Link To Document