DocumentCode
1783363
Title
A Framework for Lattice QCD Calculations on GPUs
Author
Winter, F.T. ; Clark, M.A. ; Edwards, R.G. ; Joo, Balint
Author_Institution
Thomas Jefferson Nat. Accel. Facility, Newport News, VA, USA
fYear
2014
fDate
19-23 May 2014
Firstpage
1073
Lastpage
1082
Abstract
Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl´s law issue. The Lattice QCD application Chroma allows us to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory. Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one effectively ports the whole application layer in one swing. The QDP-JIT/PTX library, our reimplementation of the low-level layer, provides a framework for Lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler which translates an assembly language (PTX) to GPU code. The existing expression templates enabled us to employ compile-time computations in order to build code generators and to automate the memory management for CUDA. Our implementation has allowed us to deploy the full Chroma gauge-generation program on large scale GPU-based machines such as Titan and Blue Waters and accelerate the calculation by more than an order of magnitude.
Keywords
graphics processing units; parallel architectures; quantum chromodynamics; software architecture; Blue Waters; CUDA C-C++; GPU programming frameworks; JIT compiler; QDP-JIT-PTX library; Titan; accelerators; application layer; assembly language; code generators; compile-time computations; computational power; full Chroma gauge-generation program; large scale machines; lattice QCD calculations; lattice field theory; low-level programming; memory management; parallel computers; porting strategy; software architecture; software interface; stencil-like operations; time-dominant algorithms; Computer architecture; Generators; Graphics processing units; Indexes; Kernel; Lattices; Libraries; Application framework; C++; CUDA; GPU; JIT; Lattice QCD; PTX;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.112
Filename
6877336
Link To Document