• DocumentCode
    1783363
  • Title

    A Framework for Lattice QCD Calculations on GPUs

  • Author

    Winter, F.T. ; Clark, M.A. ; Edwards, R.G. ; Joo, Balint

  • Author_Institution
    Thomas Jefferson Nat. Accel. Facility, Newport News, VA, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    1073
  • Lastpage
    1082
  • Abstract
    Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl´s law issue. The Lattice QCD application Chroma allows us to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory. Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one effectively ports the whole application layer in one swing. The QDP-JIT/PTX library, our reimplementation of the low-level layer, provides a framework for Lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler which translates an assembly language (PTX) to GPU code. The existing expression templates enabled us to employ compile-time computations in order to build code generators and to automate the memory management for CUDA. Our implementation has allowed us to deploy the full Chroma gauge-generation program on large scale GPU-based machines such as Titan and Blue Waters and accelerate the calculation by more than an order of magnitude.
  • Keywords
    graphics processing units; parallel architectures; quantum chromodynamics; software architecture; Blue Waters; CUDA C-C++; GPU programming frameworks; JIT compiler; QDP-JIT-PTX library; Titan; accelerators; application layer; assembly language; code generators; compile-time computations; computational power; full Chroma gauge-generation program; large scale machines; lattice QCD calculations; lattice field theory; low-level programming; memory management; parallel computers; porting strategy; software architecture; software interface; stencil-like operations; time-dominant algorithms; Computer architecture; Generators; Graphics processing units; Indexes; Kernel; Lattices; Libraries; Application framework; C++; CUDA; GPU; JIT; Lattice QCD; PTX;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.112
  • Filename
    6877336