Title :
Multi-level Layout Optimization for Efficient Spatio-temporal Queries on ISABELA-compressed Data
Author :
Gong, Zhenhuan ; Lakshminarasimhan, Sriram ; Jenkins, John ; Kolla, Hemanth ; Ethier, Stephane ; Chen, Jackie ; Ross, Robert ; Klasky, Scott ; Samatova, Nagiza F.
Author_Institution :
North Carolina State Univ., Raleigh, NC, USA
Abstract :
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O subsystems of their runtime environments, not only making I/O the primary bottleneck, but also consuming space that pushes the storage capacities of many computing facilities. These problems are exacerbated by the need to perform data-intensive analytics applications, such as querying the dataset by variable and spatio-temporal constraints, for what current database technologies commonly build query indices of size greater than that of the raw data. To help solve these problems, we present a parallel query-processing engine that can handle both range queries and queries with spatio-temporal constraints, on B-spline compressed data with user-controlled accuracy. Our method adapts to widening gaps between computation and I/O performance by querying on compressed metadata separated into bins by variable values, utilizing Hilbert space-filling curves to optimize for spatial constraints and aggregating data access to improve locality of per-bin stored data, reducing the false positive rate and latency bound I/O operations (such as seek) substantially. We show our method to be efficient with respect to storage, computation, and I/O compared to existing database technologies optimized for query processing on scientific data.
Keywords :
Hilbert spaces; data compression; database indexing; input-output programs; natural sciences computing; optimisation; parallel databases; query processing; software performance evaluation; splines (mathematics); B-spline compressed data; Hilbert space-filling curves; I-O subsystems; ISABELA-compressed data; data access aggregation; database technologies; multilevel layout optimization; parallel query-processing engine; query indices; scientific simulations; spatio-temporal queries; Bandwidth; Computational modeling; Indexes; Layout; Organizations; Query processing; Splines (mathematics);
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0975-2
DOI :
10.1109/IPDPS.2012.83