Title :
Reconciling scratch space consumption, exposure, and volatility to achieve timely staging of job input data
Author :
Monti, Henry M. ; Butt, Ali R. ; Vazhkudai, Sudharshan S.
Author_Institution :
Dept. of Comput. Sci., Virginia Tech., Blacksburg, VA, USA
Abstract :
Innovative scientific applications and emerging dense data sources are creating a data deluge for high-end computing systems. Processing such large input data typically involves copying (or staging) onto the supercomputer´s specialized high-speed storage, scratch space, for sustained high I/O throughput. The current practice of conservatively staging data as early as possible makes the data vulnerable to storage failures, which may entail re-staging and consequently reduced job throughput. To address this, we present a timely staging framework that uses a combination of job startup time predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job start-up. By delaying staging to when it is necessary, the exposure to failures and its effects can be reduced. Evaluation using both PlanetLab and simulations based on three years of Jaguar (No. 1 in Top500) job logs show as much as 85.9% reduction in staging times compared to direct transfers, 75.2% reduction in wait time on scratch, and 2.4% reduction in usage/hour.
Keywords :
data analysis; digital storage; input-output programs; parallel machines; Jaguar job logs; PlanetLab; data copying; data processing; decentralized data delivery; dense data sources; exposure; high-end computing systems; high-speed storage; innovative scientific applications; job startup time predictions; scratch space consumption; supercomputer; sustained high I/O throughput; timely job input data staging; user-specified intermediate nodes; volatility; Application software; Computer science; Delay effects; Energy management; Laboratories; Large Hadron Collider; Large-scale systems; Mathematics; Supercomputers; Throughput; HPC center serviceability; High performance data management; data-staging; end-user data delivery;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6442-5
DOI :
10.1109/IPDPS.2010.5470367