Title :
Efficient range distribution query in large-scale scientific data
Author :
Chaudhuri, Arindam ; Teng-Yok Lee ; Han-Wei Shen ; Peterka, Tom
Author_Institution :
Ohio State Univ., Columbus, OH, USA
Abstract :
Frequent access to raw data is no longer practical, if possible at all, for answering queries on large-scale data. This has led to the use of distribution-based data summaries, which can substitute for raw data to answer statistical queries of different kinds. Our work is concerned with range distribution query, which returns the distribution of an axis-aligned region of any size. We address the challenge of maintaining the interactivity and accuracy of such query results in the presence of large data. This work presents a novel and efficient framework for pre-computing and storing a set of distributions which can be used to query any arbitrary region during post-processing. We adapt an integral image based data structure to answer such queries in constant time, and propose a similarity-based encoding technique to reduce the storage cost of the data structure. Our scheme utilizes the similarity present among different regions in the data, and hence, their respective distributions. We demonstrate the use our technique in various applications, which directly or indirectly require distributions.
Keywords :
data structures; distributed processing; query processing; scientific information systems; statistical analysis; arbitrary region query; distribution-based data summaries; integral image based data structure; large-scale scientific data; query answering; range distribution query; raw data; similarity-based encoding technique; statistical queries; storage cost reduction;
Conference_Titel :
Large-Scale Data Analysis and Visualization (LDAV), 2013 IEEE Symposium on
Conference_Location :
Atlanta, GA
DOI :
10.1109/LDAV.2013.6675171