DocumentCode :
2776002
Title :
Servicing range queries on multidimensional datasets with partial replicas
Author :
Weng, Li ; Catalyurek, Umit ; Kurc, Tahsin ; Agrawal, Gagan ; Saltz, Joel
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Volume :
2
fYear :
2005
fDate :
9-12 May 2005
Firstpage :
726
Abstract :
Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.
Keywords :
query processing; replicated databases; cluster machine; distributed scientific dataset; heuristic algorithm; multidimensional dataset; oil reservoir simulation; partial replication; program compiler; range query; Biomedical engineering; Computer science; Data analysis; Data engineering; Data mining; Indexing; Information retrieval; Multidimensional systems; Parallel processing; Subcontracting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on
Print_ISBN :
0-7803-9074-1
Type :
conf
DOI :
10.1109/CCGRID.2005.1558635
Filename :
1558635
Link To Document :
بازگشت