DocumentCode :
1999923
Title :
Filesystem Aware Scalable I/O Framework for Data-Intensive Parallel Applications
Author :
Rengan Xu ; Araya-Polo, Mauricio ; Chapman, Barbara
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX, USA
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
2007
Lastpage :
2014
Abstract :
The growing speed gap between CPU and memory makes I/O the main bottleneck of many industrial applications. Some applications need to perform I/O operations for very large volume of data frequently, which will harm the performance seriously. This work´s motivation are geophysical applications used for oil and gas exploration. These applications process Terabyte size datasets in HPC facilities. The datasets represent subsurface models and field recorded data. In general term, these applications read as inputs and write as intermediate/final results huge amount of data, where the underlying algorithms implement seismic imaging techniques. The traditional sequential I/O, even when couple with advance storage systems, cannot complete all I/O operations for so large volumes of data in an acceptable time range. Parallel I/O is the general strategy to solve such problems. However, because of the dynamic property of many of these applications, each parallel process does not know the data size it needs to write until its computation is done, and it also cannot identify the position in the file to write. In order to write correctly and efficiently, communication and synchronization are required among all processes to fully exploit the parallel I/O paradigm. To tackle these issues, we use a dynamic load balancing framework that is general enough for most of these applications. And to reduce the expensive synchronization and communication overhead, we introduced a I/O node that only handles I/O request and let compute nodes perform I/O operations in parallel. By using both POSIX I/O and memory-mapping interfaces, the experiment indicates that our approach is scalable. For instance, with 16 processes, the bandwidth of parallel reading can reach the theoretical peak performance (2.5 GB/s) of the storage infrastructure. Also, the parallel writing can be up to 4.68x (speedup, POSIX I/O) and 7.23x (speedup, memory-mapping) more efficient than the serial I/O implementation. Since, - ost geophysical applications are I/O bounded, these results positively impact the overall performance of the application, and confirm the chosen strategy as path to follow.
Keywords :
geophysical prospecting; geophysical techniques; geophysics computing; input-output programs; parallel memories; resource allocation; seismology; storage management; synchronisation; CPU; HPC; POSIX I/O interface; advance storage system; communication overhead; data intensive parallel application; dynamic load balancing framework; field recorded data; filesystem aware scalable I/O framework; gas exploration; geophysical application; memory mapping interface; oil exploration; parallel I/O paradigm; parallel writing; seismic imaging technique; subsurface model; synchronization overhead; terabyte size dataset; Bandwidth; Blades; Data models; Dynamic scheduling; Load management; Synchronization; Writing; Dynamic Load Blancing; Parallel File System; Parallel I/O;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
Type :
conf
DOI :
10.1109/IPDPSW.2013.196
Filename :
6651105
Link To Document :
بازگشت