• DocumentCode
    228663
  • Title

    Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems

  • Author

    Oral, Sarp ; Simmons, Jeff ; Hill, Jason ; Leverman, Dustin ; Feiyi Wang ; Ezell, Matt ; Miller, Ross ; Fuller, Douglas ; Gunasekaran, Raghul ; Youngjae Kim ; Gupta, Swastik ; Vazhkudai, Devesh Tiwari Sudharshan S. ; Rogers, James H. ; Dillow, David ; Shi

  • Author_Institution
    Oak Ridge Leadership Comput. Facility, Oak Ridge Nat. Lab., Oak Ridge, TN, USA
  • fYear
    2014
  • fDate
    16-21 Nov. 2014
  • Firstpage
    217
  • Lastpage
    228
  • Abstract
    The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, benchmarking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.
  • Keywords
    parallel processing; software engineering; storage management; HPC; PFS; data-centric parallel file system; file system software development; storage system design; technology evaluation; Bandwidth; Benchmark testing; Computational modeling; Data models; Procurement; Servers; System performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
  • Conference_Location
    New Orleans, LA
  • Print_ISBN
    978-1-4799-5499-5
  • Type

    conf

  • DOI
    10.1109/SC.2014.23
  • Filename
    7013005