Title :
Using Space and Attribute Partitioned Partial Replicas for Data Subsetting and Aggregation Queries
Author :
Li Weng ; Catalyurek, Umit ; Kurc, Tahsin ; Agrawal, Gagan ; Saltz, Joel
Author_Institution :
Dept. of Comput. Sci. & Eng.,, Ohio State Univ., Columbus, OH
Abstract :
Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. In this paper we investigate methods for efficient execution of queries when replicas of a dataset exist; we assume the replicas have already been created and do not target the replica creation problem. We propose a cost model and algorithm for combined use of space partitioned and attribute partitioned replicas for executing data subsetting range queries. We extend the cost model and propose a greedy algorithm to address range queries with aggregation operations. The extended replica selection algorithm allows uneven partitioning of replicas across storage nodes. Different replicas can be partitioned across different subsets of storage nodes. We have implemented these techniques as part of an automatic data virtualization system and have evaluated the benefits of our techniques using this system. We demonstrate the efficacy of the algorithms on parallel machines using queries on datasets from oil reservoir simulation studies and satellite data processing applications
Keywords :
data visualisation; greedy algorithms; query processing; replicated databases; very large databases; aggregation queries; attribute partitioned partial replica; cost model; data subsetting; data virtualization system; dataset replica; extended replica selection; greedy algorithm; oil reservoir simulation; partial replication; satellite data processing; space partitioned partial replica; storage system; Aggregates; Biomedical engineering; Biomedical informatics; Computer science; Costs; Data analysis; Data engineering; Data mining; Partitioning algorithms; Subcontracting;
Conference_Titel :
Parallel Processing, 2006. ICPP 2006. International Conference on
Conference_Location :
Columbus, OH
Print_ISBN :
0-7695-2636-5
DOI :
10.1109/ICPP.2006.73