• DocumentCode
    3206512
  • Title

    Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0

  • Author

    Jin, Guohua ; Mellor-Crummey, John ; Adhianto, Laksono ; Scherer, William N., III ; Yang, Chaoran

  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    1089
  • Lastpage
    1100
  • Abstract
    Today´s largest supercomputers have over two hundred thousand CPU cores and even larger systems are under development. Typically, these systems are programmed using message passing. Over the past decade, there has been considerable interest in developing simpler and more expressive programming models for them. Partitioned global address space (PGAS) languages are viewed as perhaps the most promising alternative. In this paper, we report on our experience developing a set of PGAS extensions to Fortran that we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Co array Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s with HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad. we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Coarray Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s wit- - h HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad.
  • Keywords
    FORTRAN; fast Fourier transforms; message passing; multiprocessing systems; parallel machines; parallel programming; synchronisation; CAF 2.0; CPU core; Coarray Fortran 2.0; Cray XT; FFT; HPC challenge benchmark; PGAS extension; PGAS language; STREAM triad; asynchronous communication; collective communication; fast Fourier transform; function shipping; high performance Linpack; high performance computing challenge; message passing; parallel programming; partitioned global address space; programming model; random access; single socket quad-core Opteron processor; supercomputer; synchronization; Benchmark testing; Computer languages; Data structures; Electronics packaging; Program processors; Runtime; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International
  • Conference_Location
    Anchorage, AK
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-372-8
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.104
  • Filename
    6012916