DocumentCode
3206512
Title
Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0
Author
Jin, Guohua ; Mellor-Crummey, John ; Adhianto, Laksono ; Scherer, William N., III ; Yang, Chaoran
fYear
2011
fDate
16-20 May 2011
Firstpage
1089
Lastpage
1100
Abstract
Today´s largest supercomputers have over two hundred thousand CPU cores and even larger systems are under development. Typically, these systems are programmed using message passing. Over the past decade, there has been considerable interest in developing simpler and more expressive programming models for them. Partitioned global address space (PGAS) languages are viewed as perhaps the most promising alternative. In this paper, we report on our experience developing a set of PGAS extensions to Fortran that we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Co array Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s with HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad. we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Coarray Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s wit- - h HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad.
Keywords
FORTRAN; fast Fourier transforms; message passing; multiprocessing systems; parallel machines; parallel programming; synchronisation; CAF 2.0; CPU core; Coarray Fortran 2.0; Cray XT; FFT; HPC challenge benchmark; PGAS extension; PGAS language; STREAM triad; asynchronous communication; collective communication; fast Fourier transform; function shipping; high performance Linpack; high performance computing challenge; message passing; parallel programming; partitioned global address space; programming model; random access; single socket quad-core Opteron processor; supercomputer; synchronization; Benchmark testing; Computer languages; Data structures; Electronics packaging; Program processors; Runtime; Synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International
Conference_Location
Anchorage, AK
ISSN
1530-2075
Print_ISBN
978-1-61284-372-8
Electronic_ISBN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2011.104
Filename
6012916
Link To Document