• DocumentCode
    581012
  • Title

    Latency tolerance for Throughput Computing: Designer track

  • Author

    Lu, Chien-Ping ; Ko, Brian

  • fYear
    2012
  • fDate
    5-8 Nov. 2012
  • Firstpage
    524
  • Lastpage
    525
  • Abstract
    In Throughput Computing, the data can be processed independently with a substantial amount of threads running similar programs, referred to as kernels, or shaders for graphics specific workload. A Throughput Computing device, such as GPU, requires task latency tolerance to hold the context of the outstanding threads, and data latency tolerance to hold spaces for memory requests issued from the threads. The threads are grouped into thread groups. The register file and the associated number of outstanding thread groups should be sized according to the ratio of the computing resources to load/store units. Such a ratio should reflect the balance between ALU and load/store instructions of target workload.
  • Keywords
    computer graphics; multi-threading; ALU; GPU; data latency tolerance; graphics specific workload; load-store instructions; load-store units; memory requests; outstanding thread groups; throughput computing; Bandwidth; Context; Graphics processing units; Instruction sets; Kernel; Registers; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Aided Design (ICCAD), 2012 IEEE/ACM International Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    1092-3152
  • Type

    conf

  • Filename
    6386720