Title :
I/O performance characterization of Lustre and NASA applications on Pleiades
Author :
Saini, Shrikant ; Rappleye, J. ; Chang, Joana ; Barker, D. ; Mehrotra, Parul ; Biswas, Rubel
Author_Institution :
Ames Res. Center, NASA Adv. Supercomput. (NAS) Div., NASA, Moffett Field, CA, USA
Abstract :
In this paper we study the performance of the Lustre file system using five scientific and engineering applications representative of NASA workload on large-scale supercomputing systems such as NASA´s Pleiades. In order to facilitate the collection of Lustre performance metrics, we have developed a software tool that exports a wide variety of client and server-side metrics using SGI´s Performance Co-Pilot (PCP), and generates a human readable report on key metrics at the end of a batch job. These performance metrics are (a) amount of data read and written, (b) number of files opened and closed, and (c) remote procedure call (RPC) size distribution (4 KB to 1024 KB, in powers of 2) for I/O operations. RPC size distribution measures the efficiency of the Lustre client and can pinpoint problems such as small write sizes, disk fragmentation, etc. These extracted statistics are useful in determining the I/O pattern of the application and can assist in identifying possible improvements for users´ applications. Information on the number of file operations enables a scientist to optimize the I/O performance of their applications. Amount of I/O data helps users choose the optimal stripe size and stripe count to enhance I/O performance. In this paper, we demonstrate the usefulness of this tool on Pleiades for five production quality NASA scientific and engineering applications. We compare the latency of read and write operations under Lustre to that with NFS by tracing system calls and signals. We also investigate the read and write policies and study the effect of page cache size on I/O operations. We examine the performance impact of Lustre stripe size and stripe count along with performance evaluation of file per process and single shared file accessed by all the processes for NASA workload using parameterized IOR benchmark.
Keywords :
input-output programs; natural sciences computing; parallel machines; remote procedure calls; software tools; IO performance characterization; Lustre file system; Lustre stripe size; NASA applications; NFS; PCP; Pleiades; RPC; SGI performance copilot; batch job; disk fragmentation; engineering applications; human readable report; large-scale supercomputing systems; parameterized IOR benchmark; remote procedure call size distribution; scientific applications; small write sizes; software tool; stripe count; I/O cache effect; I/O latency; I/O performance evaluation; Key words: Lustre file system; Read and Write Policy; benchmarking; climate modeling; computational fluid dynamics;
Conference_Titel :
High Performance Computing (HiPC), 2012 19th International Conference on
Conference_Location :
Pune
Print_ISBN :
978-1-4673-2372-7
Electronic_ISBN :
978-1-4673-2370-3
DOI :
10.1109/HiPC.2012.6507507