DocumentCode :
560190
Title :
ISABELA-QA: Query-driven analytics with ISABELA-compressed extreme-scale scientific data
Author :
Lakshminarasimhan, Sriram ; Jenkins, John ; Arkatkar, Isha ; Gong, Zhenhuan ; Kolla, Hemanth ; Ku, Seung-Hoe ; Ethier, Stephane ; Chen, Jackie ; Chang, C.S. ; Klasky, Scott ; Latham, Robert ; Ross, Robert ; Samatova, Nagiza F.
Author_Institution :
North Carolina State Univ., Raleigh, NC, USA
fYear :
2011
fDate :
12-18 Nov. 2011
Firstpage :
1
Lastpage :
11
Abstract :
Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed - rather than original, full-size - data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatio-temporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries - a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first technology that enables query-driven analytics over the compressed spatio-temporal floating-point double- or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.
Keywords :
data compression; database indexing; meta data; query processing; ISABELA-QA; ISABELA-compressed extreme-scale scientific data; compression technology; data management; database indexing; extreme-scale simulations; indexing technology; metadata; parallel query processing engine; query-driven analytics; spatio-temporal multivariate scientific data; storage-and-I/O-bound application; user-specified range queries; Analytical models; Data models; Indexing; Query processing; Spline; Compression; Data reduction; Data-intensive computing; High performance applications; Query-driven Analytics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Conference_Location :
Seatle, WA
Electronic_ISBN :
978-1-4503-0771-0
Type :
conf
Filename :
6114457
Link To Document :
بازگشت