Author_Institution :
Dept. of Comput. Sci., Renmin Univ. of China, Beijing, China
Abstract :
With the explosive growth of big data, data-intensive analysis services need a large data storage and high efficient parallel query processing techniques in cloud using lower-end machines, because most organizations adopt inexpensive low-end clusters. A key challenge is to optimize query processing in such a cloud environment. In this paper, we present a new hybrid data access architecture, called HyDB, for providing data-intensive analysis services, which is featured with a distributed data storage, parallel data access, and query optimization methods. First, we propose a new data partitioning method based on both workloads and co-located resources. The data partitioning method achieves higher consolidation and outperforms the existing approaches. Second, we provide a new parallel access method which includes parallel query processing, optimal query plan generation, and optimal path selection by using a plan tree pruning technique. We have implemented HyDB. Finally, we conduct extensive experimental studies and confirm the efficiency of our HyDB.
Keywords :
cloud computing; parallel processing; query processing; service-oriented architecture; storage management; trees (mathematics); HyDB; access optimization; colocated resources; data partitioning method; data storage; data-intensive analysis services; distributed data storage; hybrid data access architecture; lower-end machines; optimal path selection; optimal query plan generation; parallel data access; parallel query processing; plan tree pruning technique; query optimization; Distributed databases; Engines; Marketing and sales; Peer to peer computing; Query processing; Redundancy; MapReduce; data intensive service; data partition; database; query optimization;