Title :
Acceleration of Data-Intensive Workflow Applications by Using File Access History
Author :
Horiuchi, Masaru ; Taura, Koichi
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
Abstract :
Data I/O has been one of major bottlenecks in the execution of data-intensive workflow applications. Appropriate task scheduling of a workflow can achieve high I/O throughput by reducing remote data accesses. However, most such task scheduling algorithms require the user to explicitly describe files to be accessed by each job, typically by stage-in/stage-out directives in job description, where such annotations are at best tedious and sometime impossible. Thus, a more automated mechanism is necessary. In this paper, we propose a method for predicting input/output files of each job without user-supplied annotations. It predicts I/O files by collecting file access history in a profiling run prior to the production run. We implemented the proposed method in a workflow system GXP Make and a distributed file system Mogami. We evaluate our system with two real workflow applications. Our data-aware job scheduler increases the ratio of local file accesses from 50% to 75% in one application and from 23% to 45% in the other. As a result, it reduces the makespan of the two applications by 2.5% and 7.5%, respectively.
Keywords :
distributed processing; file organisation; scheduling; workflow management software; GXP Make workflow system; Mogami distributed file system; data input-output; data-aware job scheduler; data-intensive workflow application; file access history; input-output throughput; local file access ratio; stage-in-stage-out directive; user-supplied annotation; workflow task scheduling;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
DOI :
10.1109/SC.Companion.2012.31