• DocumentCode
    166691
  • Title

    Disk cache-aware task scheduling for data-intensive and many-task workflow

  • Author

    Tanaka, Mitsuru ; Tatebe, Osamu

  • Author_Institution
    Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan
  • fYear
    2014
  • fDate
    22-26 Sept. 2014
  • Firstpage
    167
  • Lastpage
    175
  • Abstract
    Workflow scheduling to maximize I/O performance is one of the key issues in data-intensive, many-task computing. In our previous work, we proposed locality-aware workflow scheduling method using the Multi-Constraint Graph Partitioning. In this work, we focus on read performance of input files from the disk cache (buffer cache or page cache on main memory). In order to maximize the disk cache hit rate of input files, a LIFO-order scheduling is effective since created intermediate files may be read soon. However, LIFO policy has a disadvantage of so-called “trailing task problem.” We propose a hybrid scheduling strategy of LIFO and HRF (Highest Rank First). In our strategy, one of two policies is applied depending on the number of highest-rank tasks in the queue to avoid the problem. In addition, scheduling for the overlap of computation and I/O is proposed. We implement our scheduling strategy for the Pwrake workflow system and the Gfarm distributed file system and evaluate it by executing data-intensive workflows using a computer cluster. Our scheduling strategy improves the performance of copyfile workflow by 30% due to increase in disk cache hit rate, and the performance of Montage workflow by 12% due to increase in core utilization.
  • Keywords
    cache storage; distributed processing; scheduling; workflow management software; Gfarm distributed file system; HRF policy; LIFO-order scheduling; Pwrake workflow system; buffer cache; copyfile workflow; disk cache hit rate; disk cache-aware task scheduling; highest rank first policy; input-output performance; last-in first-out policy; many-task computing; multiconstraint graph partitioning; page cache; scheduling strategy; trailing task problem; workflow scheduling; Bismuth; Computers; File systems; Partitioning algorithms; Scheduling; Scheduling algorithms; distributed file system; many task computing; task scheduling; workflow system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2014 IEEE International Conference on
  • Conference_Location
    Madrid
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2014.6968774
  • Filename
    6968774