DocumentCode
611058
Title
Supporting a Light-Weight Data Management Layer over HDF5
Author
Yi Wang ; Yu Su ; Agrawal, Gagan
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear
2013
fDate
13-16 May 2013
Firstpage
335
Lastpage
342
Abstract
Scientific simulations are now being performed at finer temporal and spatial scales, leading to an explosion of the output data, and challenges in storing, managing, disseminating, analyzing, and visualizing these datasets. Tools commonly used today for disseminating and visualizing such data have inherent limitations, making it extremely hard to deal with larger datasets. We have developed a light-weight data management tool, which allows server-side sub setting and aggregation on scientific datasets stored in HDF5, one of the most popular scientific data formats. To support a variety of queries efficiently, our tool generates code for hyper slab selector and content-based filtering, and parallelizes selection and aggregation queries efficiently using novel algorithms. Additionally, our tool also supports certain most recent HDF5 features including dimension scale and compound data type. Through extensive evaluation, we show that our system is capable of efficiently supporting a variety of queries, scaling performance by parallelizing the queries, and reducing wide area data transfers through server-side data aggregation. We demonstrate that even for sub setting queries that are directly supported in OPeNDAP, a tool widely used by data dissemination portals, the sequential performance of our system is better.
Keywords
content management; database management systems; information filtering; query processing; scientific information systems; HDF5; OPeNDAP tool; aggregation query; compound data type; content-based filtering; data dissemination portal; data transfer; dimension scale; hyper slab selector; light-weight data management layer; light-weight data management tool; query subsetting; scientific data format; scientific dataset; scientific simulation; server-side data aggregation; server-side subsetting; Compounds; Data models; Data transfer; Data visualization; Indexes; Layout; Libraries;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on
Conference_Location
Delft
Print_ISBN
978-1-4673-6465-2
Type
conf
DOI
10.1109/CCGrid.2013.9
Filename
6546110
Link To Document