DocumentCode
3120720
Title
Zest Checkpoint storage system for large supercomputers
Author
Nowoczynski, Paul ; Stone, Nathan ; Yanovich, Jared ; Sommerfield, Jason
Author_Institution
Pittsburgh Supercomput. Center, Pittsburgh, PA
fYear
2008
fDate
17-17 Nov. 2008
Firstpage
1
Lastpage
5
Abstract
The PSC has developed a prototype distributed file system infrastructure that vastly accelerates aggregated write bandwidth on large compute platforms. Write bandwidth, more than read bandwidth, is the dominant bottleneck in HPC I/O scenarios due to writing checkpoint data, visualization data and post-processing (multi-stage) data. We have prototyped a scalable solution that will be directly applicable to future petascale compute platforms having of order 10^6 cores. Our design emphasizes high-efficiency scalability, low-cost commodity components, lightweight software layers, end-to-end parallelism, client-side caching and software parity, and a unique model of load-balancing outgoing I/O onto high-speed intermediate storage followed by asynchronous reconstruction to a 3rd-party parallel file system.
Keywords
checkpointing; data visualisation; input-output programs; mainframes; parallel processing; program verification; resource allocation; HPC I-O scenarios; asynchronous reconstruction; checkpoint storage system; client-side caching; data checkpoint; data visualization; end-to-end parallelism; high-speed intermediate storage; load-balancing; parallel file system; petascale compute platforms; post-processing data; prototype distributed file system infrastructure; software layers; software parity; Acceleration; Bandwidth; Data visualization; Distributed computing; File systems; Petascale computing; Prototypes; Software prototyping; Supercomputers; Writing; Client-side Raid; High-performance commodity storage; Parallel Application Checkpoint; Parallel I/O; Petascale Storage; Terabytes per second; log-structured filesystems;
fLanguage
English
Publisher
ieee
Conference_Titel
Petascale Data Storage Workshop, 2008. PDSW '08. 3rd
Conference_Location
Austin, TX
Print_ISBN
978-1-4244-4208-9
Type
conf
DOI
10.1109/PDSW.2008.4811883
Filename
4811883
Link To Document