• DocumentCode
    2534395
  • Title

    A Micro-benchmark Suite for AMD GPUs

  • Author

    Taylor, Ryan ; Li, Xiaoming

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Delaware, Newark, DE, USA
  • fYear
    2010
  • fDate
    13-16 Sept. 2010
  • Firstpage
    387
  • Lastpage
    396
  • Abstract
    Optimizing programs for Graphic Processing Unit (GPU) requires thorough knowledge about the values of architectural features for the new computing platform. However, this knowledge is frequently unavailable, e.g., due to insufficient documentation, which is probably a result of the infancy of general purpose computing on the GPU. What makes the modeling of program performance on GPU even more difficult is that the exact value of some “architectural” parameters on the GPU depends on how a GPU program interacts with those features. For example, AMD GPUs show different memory latencies when the memory is accessed with address sequences that have different patterns. Current micro-benchmark suites such as X-Ray are powerless for characterizing the GPU. Clearly, a preliminary for efficient code optimization and automatic tuning on the GPU is a systematic method to measure the architectural features and identify the most basic program characteristics that determine the performance of a program on the new GPU architectures. In this paper, we present a micro-benchmark suite for AMD GPUs that supports the AMD StreamSDK. Our model identifies and measures a series of architectural features and basic program characteristics that are most important and most predictive for program performance on the platform. The features and characteristics include vectorization, burst write latency, texture fetch latency, global read and write latency, ALU/Fetch operation ratio, domain size and register usage for both AMD´s pixel shader and compute shader modes. Our performance model not only generates correct values for those parameters, but also provides a clear picture of program performance on the GPU.
  • Keywords
    computer graphic equipment; coprocessors; ALU-fetch operation ratio; AMD GPU; AMD StreamSDK; AMD pixel shader; address sequences; architectural features; automatic tuning; basic program characteristics; burst write latency; code optimization; compute shader modes; domain size; general purpose computing; global read; graphic processing unit; memory latencies; microbenchmark suites; program performance modeling; register usage; texture fetch latency; vectorization; write latency; x-ray; Benchmark testing; Engines; Graphics processing unit; Hardware; Instruction sets; Kernel; Registers; AMD; ATI; GPU; benchmark;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops (ICPPW), 2010 39th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1530-2016
  • Print_ISBN
    978-1-4244-7918-4
  • Electronic_ISBN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2010.59
  • Filename
    5599097