• DocumentCode
    579747
  • Title

    Divergence Analysis with Affine Constraints

  • Author

    Sampaio, Diogo ; Martins, Rafael ; Collange, Sylvain ; Pereira, Fernando Magno Quintão

  • Author_Institution
    Dept. de Cienc. da Comput., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
  • fYear
    2012
  • fDate
    24-26 Oct. 2012
  • Firstpage
    67
  • Lastpage
    74
  • Abstract
    The rising popularity of graphics processing units is bringing renewed interest in code optimization techniques for SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented this analysis in Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. We can mark about one fourth of all program variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot´s register allocator. Divergence analysis with affine constraints is publicly available in the Ocelot compiler since June/2012.
  • Keywords
    multi-threading; operating system kernels; optimising compilers; parallel architectures; public domain software; shared memory systems; CUDA kernel suite; GPU code; Ocelot compiler; SIMD processors; affine constraints; code optimization techniques; divergence analysis; divergence aware register allocator; graphics processing units; open source compiler; program variables; shared memory; thread identifiers; uniform variable classification; uniform variables; Abstracts; Graphics processing units; Instruction sets; Optimization; Registers; Resource management; Synchronization; Divergence; GPU; SIMD;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on
  • Conference_Location
    New York, NY
  • ISSN
    1550-6533
  • Print_ISBN
    978-1-4673-4790-7
  • Type

    conf

  • DOI
    10.1109/SBAC-PAD.2012.22
  • Filename
    6374773