• DocumentCode
    3588697
  • Title

    Improving large-scale storage system performance via topology-aware and balanced data placement

  • Author

    Feiyi Wang ; Oral, Sarp ; Gupta, Saurabh ; Tiwari, Devesh ; Vazhkudai, Sudharshan S.

  • Author_Institution
    Nat. Center for Comput. Sci., Oak Ridge Nat. Lab., Oak Ridge, TN, USA
  • fYear
    2014
  • Firstpage
    656
  • Lastpage
    663
  • Abstract
    With the advent of big data, the I/O subsystems of large-scale compute clusters are becoming a center of focus. More applications are putting greater demands on end-to-end I/O performance. These subsystems are often complex in design. They comprise of multiple hardware and software layers to cope with the increasing capacity, capability, and scalability requirements of data intensive applications. However, the sharing nature of storage resources and the intrinsic interactions across these layers make it a great challenge to realize end-to-end performance gains. This paper proposes a topology-aware strategy to balance the load across resources, to improve the per-application I/O performance. We demonstrate the effectiveness of our algorithm on an extreme-scale compute cluster, Titan, at the Oak Ridge Leadership Computing Facility (OLCF). Our experiments with both synthetic benchmarks and a real-world application show that, even under congestion, our proposed algorithm can improve large-scale application I/O performance significantly, resulting in both a reduction in application run time as well as a higher resolution of simulation run.
  • Keywords
    input-output programs; large-scale systems; resource allocation; storage management; OLCF; Oak Ridge Leadership Computing Facility; Titan; balanced data placement; extreme-scale compute cluster; large-scale storage system performance; per-application I/O performance; topology-aware data placement; Benchmark testing; Frequency control; Indexes; Libraries; Resource management; Routing; Switches; High Performance Computing; Parallel File System; Performance Evaluation; Storage Area Network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/PADSW.2014.7097866
  • Filename
    7097866