DocumentCode :
3206512
Title :
Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0
Author :
Jin, Guohua ; Mellor-Crummey, John ; Adhianto, Laksono ; Scherer, William N., III ; Yang, Chaoran
fYear :
2011
fDate :
16-20 May 2011
Firstpage :
1089
Lastpage :
1100
Abstract :
Today´s largest supercomputers have over two hundred thousand CPU cores and even larger systems are under development. Typically, these systems are programmed using message passing. Over the past decade, there has been considerable interest in developing simpler and more expressive programming models for them. Partitioned global address space (PGAS) languages are viewed as perhaps the most promising alternative. In this paper, we report on our experience developing a set of PGAS extensions to Fortran that we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Co array Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s with HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad. we call Co array Fortran 2.0 (CAF 2.0). Our design for CAF 2.0 goes well beyond the original 1998 design of Coarray Fortran (CAF) by Numrich and Reid. CAF 2.0 includes language support for many features including teams, collective communication, asynchronous communication, function shipping, and synchronization. We describe the implementation of these features and our experiences using them to implement the High Performance Computing Challenge (HPCC) benchmarks, including High Performance Linpack (HPL), Random Access, Fast Fourier Transform (FFT), and STREAM triad. On 4096 CPU cores of a Cray XT with 2.3 GHz single socket quad-core Opteron processors, we achieved 18.3 TFLOP/s wit- - h HPL, 2.01 GUP/s with Random Access, 125 GFLOP/s with FFT, and a bandwidth of 8.73 TByte/s with STREAM triad.
Keywords :
FORTRAN; fast Fourier transforms; message passing; multiprocessing systems; parallel machines; parallel programming; synchronisation; CAF 2.0; CPU core; Coarray Fortran 2.0; Cray XT; FFT; HPC challenge benchmark; PGAS extension; PGAS language; STREAM triad; asynchronous communication; collective communication; fast Fourier transform; function shipping; high performance Linpack; high performance computing challenge; message passing; parallel programming; partitioned global address space; programming model; random access; single socket quad-core Opteron processor; supercomputer; synchronization; Benchmark testing; Computer languages; Data structures; Electronics packaging; Program processors; Runtime; Synchronization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International
Conference_Location :
Anchorage, AK
ISSN :
1530-2075
Print_ISBN :
978-1-61284-372-8
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2011.104
Filename :
6012916
Link To Document :
بازگشت