Latency tolerance for Throughput Computing: Designer track

Author

Lu, Chien-Ping ; Ko, Brian

fYear

2012

fDate

5-8 Nov. 2012

Firstpage

524

Lastpage

525

Abstract

In Throughput Computing, the data can be processed independently with a substantial amount of threads running similar programs, referred to as kernels, or shaders for graphics specific workload. A Throughput Computing device, such as GPU, requires task latency tolerance to hold the context of the outstanding threads, and data latency tolerance to hold spaces for memory requests issued from the threads. The threads are grouped into thread groups. The register file and the associated number of outstanding thread groups should be sized according to the ratio of the computing resources to load/store units. Such a ratio should reflect the balance between ALU and load/store instructions of target workload.

Keywords

computer graphics; multi-threading; ALU; GPU; data latency tolerance; graphics specific workload; load-store instructions; load-store units; memory requests; outstanding thread groups; throughput computing; Bandwidth; Context; Graphics processing units; Instruction sets; Kernel; Registers; Throughput;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer-Aided Design (ICCAD), 2012 IEEE/ACM International Conference on

Conference_Location

San Jose, CA

ISSN

1092-3152

Type

conf

Filename

6386720

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=581012