• DocumentCode
    1925862
  • Title

    Overlapped checkpointing with hardware assist

  • Author

    Mitchell, Christopher ; Nunez, James ; Wang, Jun

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
  • fYear
    2009
  • fDate
    Aug. 31 2009-Sept. 4 2009
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    We present a new approach to handling the demanding I/O workload incurred during checkpoint writes encountered in High Performance Computing. Prior efforts to improve performance have been bound by issues such as hard drive limitations, and the network. Our research surpasses this limitation by providing a method to: (1) write checkpoint data to a high-speed, non-volatile buffer, and (2) asynchronously write this data to permanent storage while resuming computation. This removes the hard drive from the critical data path because our I/O node based buffers isolate the compute nodes from the storage servers. This solution is feasible because of industry declines in cost for high-capacity, non-volatile storage technologies. Testing was conducted using a standardized HPC benchmark on a test bed cluster at Los Alamos National Laboratory. Results show a definitive speedup factor for select workloads over writing directly to a typical global parallel file system; the Panasas ActiveScale File System.
  • Keywords
    buffer storage; checkpointing; parallel memories; I-O node; Los Alamos National Laboratory; Panasas ActiveScale File System; global parallel file system; hard drive; high performance computing; high-speed nonvolatile buffer; overlapped checkpointing; permanent storage; speedup factor; storage server; Benchmark testing; Buffer storage; Checkpointing; Costs; File systems; Hardware; High performance computing; Isolation technology; Laboratories; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
  • Conference_Location
    New Orleans, LA
  • ISSN
    1552-5244
  • Print_ISBN
    978-1-4244-5011-4
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2009.5289154
  • Filename
    5289154