DocumentCode :
592855
Title :
Communication cost optimization for cloud Data Warehouse queries
Author :
Kurunji, S. ; Tingjian Ge ; Benyuan Liu ; Chen, C.X.
Author_Institution :
Comput. Sci. Dept., Univ. of Massachusetts Lowell, Lowell, MA, USA
fYear :
2012
fDate :
3-6 Dec. 2012
Firstpage :
512
Lastpage :
519
Abstract :
Read-Optimized databases are well suited for read intensive Data Warehouse applications. In addition, data in these applications grow rapidly and hence need a dynamically scalable environment like Cloud. Cloud provides a flexible environment where user can load data, execute queries and scale resources on demand. However, cloud has its own challenges. To reduce the inter-node communication during the execution of query, tables are horizontally partitioned on join attribute and then related partitions are stored on the same physical system. In cloud environment it is not possible to ensure that these related partitions are always stored on the same physical system. As the resources are scaled up, the number of nodes involved increases, resulting in the increased inter-node communication. This becomes critical when we have huge data (in Tera or Peta bytes) stored across a large number of nodes. So with the increase in number of nodes and data size, the communication message size increases. All these factors result in increased bandwidth usage and performance degradation. When the number of joins in a query increases, the performance will further degrade. These problems emphasize a need for good storage structure and query execution plan. In this paper we propose a storage structure PK-map and a query processing algorithm. We show, through experiments, that this approach not only decreases the inter-node communication overhead but also decreases the work load of joins.
Keywords :
data warehouses; query processing; resource allocation; software performance evaluation; storage management; PK-map storage structure; bandwidth usage; cloud data warehouse queries; communication cost optimization; communication message size; data loading; data size; dynamically scalable environment; flexible environment; horizontally partitioned tables; inter-node communication reduction; join attribute; performance degradation; query execution plan; query processing algorithm; read intensive data warehouse applications; read-optimized database; resource scaling; Cloud computing; Conferences; Data warehouses; Distributed databases; Indexes; Query processing; Cloud Strorage; Communication Cost; Data Warehouse; Multi-join Query; Query Optimization; Read-Optimized Database;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4673-4511-8
Electronic_ISBN :
978-1-4673-4509-5
Type :
conf
DOI :
10.1109/CloudCom.2012.6427580
Filename :
6427580
Link To Document :
بازگشت