• DocumentCode
    2396721
  • Title

    Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines

  • Author

    Prabhakar, Ramya ; Vazhkudai, Sudharshan S. ; Kim, Youngjae ; Butt, Ali R. ; Li, Min ; Kandemir, Mahmut

  • fYear
    2011
  • fDate
    20-24 June 2011
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to improve their I/O performance. Traditional parallel file systems (PFS) in high performance computing (HPC) systems are unable to keep up with such high data rates, creating a storage wall. In this work, we present a novel multi-tiered storage architecture comprising hybrid node-local resources to construct a dynamic data staging area for extreme-scale machines. Such a staging ground serves as an impedance matching device between applications and the PFS. Our solution combines diverse resources (e.g., DRAM, SSD) in such a way as to approach the performance of the fastest component technology and the cost of the least expensive one. We have developed an automated provisioning algorithm that aids in meeting the check pointing performance requirement of HPC applications, by using a least-cost storage configuration. We evaluate our approach using both an implementation on a large scale cluster and a simulation driven by six-years worth of Jaguar supercomputer job-logs, and show that our approach, by choosing an appropriate storage configuration, achieves 41.5% cost savings with only negligible impact on performance.
  • Keywords
    checkpointing; data handling; file organisation; parallel machines; HPC systems; I/O performance; Jaguar supercomputer job logs; automated provisioning algorithm; checkpointing performance requirement; dynamic data staging area; extreme scale machine; extreme scale supercomputer; high performance computing; impedance matching; least cost storage configuration; massively parallel scientific application; multitiered data staging area; multitiered storage architecture; parallel file systems; Aggregates; Checkpointing; Computer architecture; Random access memory; Resource management; Supercomputers; Throughput; Hierarchial Storage; High performance computing; Provisioning; SSDs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems (ICDCS), 2011 31st International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    1063-6927
  • Print_ISBN
    978-1-61284-384-1
  • Electronic_ISBN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2011.33
  • Filename
    5961683