Title :
Towards parallel access of multi-dimensional, multi-resolution scientific data
Author :
Kumar, S. ; Pascucci, V. ; Vishwanath, V. ; Carns, P. ; Hereld, M. ; Latham, R. ; Peterka, T. ; Papka, M.E. ; Ross, R.
Author_Institution :
SCI Inst., Univ. of Utah, Salt Lake City, UT, USA
Abstract :
Large-scale scientific simulations routinely produce data of increasing resolution. Analyzing this data is key to scientific discovery. A critical bottleneck facing data analysis is the I/O time to access the data due to the disparity between a simulation´s data layout and the data layout requirements of analysis applications. One method of addressing this problem is to reorganize the data in a manner that makes it more amenable to analysis and visualization. The IDX file format is one example of this approach. It orders data points so that they can be accessed at multiple resolution levels with favorable spatial locality and caching properties. IDX has been used successfully in fields such as digital photography and visualization of large scientific data, and is a promising approach for analysis of HPC data. Unfortunately, the existing tools for writing data in this format only provide a serial interface. HPC applications must therefore either write all data from a single process or convert existing data as a post-processing step, in either case failing to utilize available parallel I/O resources. In this work, we provide an overview of the IDX file format and the existing ViSUS library that provides serial access to IDX data. We investigate methods for writing IDX data in parallel and demonstrate that it is possible for HPC applications to write data directly into IDX format with scalable performance. Our preliminary results demonstrate 60% of the peak I/O throughput when reorganizing and writing the data from 512 processes on an IBM BG/P system. We also analyze the performance bottlenecks and propose future work towards a flexible and efficient implementation.
Keywords :
data analysis; data structures; data visualisation; input-output programs; scientific information systems; HPC data; IDX file format; ViSUS library; caching property; data analysis; data layout; digital photography; favorable spatial locality; large scale scientific simulation; multiple resolution level; multiresolution scientific data; parallel I/O resource; parallel access; postprocessing step; scientific data visualization; scientific discovery; serial interface; writing data; Data models; Data visualization; Libraries; Optimization; Prototypes; Throughput; Writing; Multi dimensional data; Parallel IO;
Conference_Titel :
Petascale Data Storage Workshop (PDSW), 2010 5th
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4244-8913-8
DOI :
10.1109/PDSW.2010.5668090