Title :
CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
Author :
Dorier, Matthieu ; Antoniu, Gabriel ; Ross, Robert ; Kimpe, Dries ; Ibrahim, Shadi
Author_Institution :
ENS Cachan Brittany, IRISA, Rennes, France
Abstract :
Unmatched computation and storage performance in new HPC systems have led to a plethora of I/O optimizations ranging from application-side collective I/O to network and disk-level request scheduling on the file system side. As we deal with ever larger machines, the interference produced by multiple applications accessing a shared parallel file system in a concurrent manner becomes a major problem. Interference often breaks single-application I/O optimizations, dramatically degrading application I/O performance and, as a result, lowering machine wide efficiency. This paper focuses on CALCioM, a framework that aims to mitigate I/O interference through the dynamic selection of appropriate scheduling policies. CALCioM allows several applications running on a supercomputer to communicate and coordinate their I/O strategy in order to avoid interfering with one another. In this work, we examine four I/O strategies that can be accommodated in this framework: serializing, interrupting, interfering and coordinating. Experiments on Argonne´s BG/P Surveyor machine and on several clusters of the French Grid´5000 show how CALCioM can be used to efficiently and transparently improve the scheduling strategy between two otherwise interfering applications, given specified metrics of machine wide efficiency.
Keywords :
input-output programs; parallel processing; scheduling; Argonne BG-P surveyor machine; CALCioM; French Grid´5000; HPC systems; IO interference mitigation; IO optimizations; application-side collective IO; cross-application coordination; disk-level request scheduling; file system side; network-level request scheduling; scheduling policies; scheduling strategy; shared parallel file system; storage performance; unmatched computation performance; Dynamic scheduling; Interference; Measurement; Optimization; Servers; Supercomputers; Throughput; CALCioM; Cross-Application Contention; I/O; Interference; Parallel File Systems;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-3799-8
DOI :
10.1109/IPDPS.2014.27