Title :
On the benefits of a workflow-aware file system in high-performance computing systems
Author :
Wang, Yang ; Lu, Paul
Author_Institution :
Dept. of Comput. Sci., Alberta Univ., Edmonton, Alta.
Abstract :
Traditional high-performance computing (HPC) systems have independent job schedulers and file systems that do not interact in substantial ways. We make the case that some integration of scheduler and file system can have three main benefits. First, the dataflow dependencies between the jobs in a workflow can be inferred by combining the scheduler´s knowledge of the jobs (and possibly control-flow) and the file system´s knowledge of the files accessed. Second, the dataflow information can be used to improve workflow instance concurrency when there are (potential) filename conflicts. Third, when workflows need to be re-computed, only the affected jobs need to be re-executed. We present the design and a simulation study of the Workflow-Aware File System (WaFS). Our design layers a namespace manager (NM) on top of existing file systems to provide, for example, a dataflow engine and a versioned file system. Our simulation study (with a specific set of application parameters) shows that a combined WaFS-aware file system and scheduler can significantly improve makespans for intensive workloads and be efficient in the re-computation of jobs
Keywords :
data flow analysis; file organisation; scheduling; workflow management software; control-flow; dataflow engine; dataflow information; high-performance computing systems; job schedulers; namespace manager; versioned file system; workflow instance concurrency; workflow-aware file system; Computational modeling; Concurrent computing; Control systems; Data mining; Engines; Feature extraction; File systems; Processor scheduling;
Conference_Titel :
High-Performance Computing in Asia-Pacific Region, 2005. Proceedings. Eighth International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7695-2486-9
DOI :
10.1109/HPCASIA.2005.58