مرکز منطقه ای اطلاع رساني علوم و فناوري - An efficient compiler framework for cache bypassing on GPUs

DocumentCode :

659045

Title :

An efficient compiler framework for cache bypassing on GPUs

Author :

Xiaolong Xie ; Yun Liang ; Guangyu Sun ; Deming Chen

Author_Institution :

Center for Energy-Efficient Comput. & Applic., Peking Univ., Beijing, China

fYear :

2013

fDate :

18-21 Nov. 2013

Firstpage :

516

Lastpage :

523

Abstract :

Graphics Processing Units (GPUs) have become ubiquitous for general purpose applications due to their tremendous computing power. Initially, GPUs only employ scratchpad memory as on-chip memory. Though scratchpad memory benefits many applications, it is not ideal for those general purpose applications with irregular memory accesses. Hence, GPU vendors have introduced caches in conjunction with scratchpad memory in the recent generations of GPUs. The caches on GPUs are highly-configurable. The programmer or the compiler can explicitly control cache access or bypass for global load instructions. This highly-configurable feature of GPU caches opens up the opportunities for optimizing the cache performance. In this paper, we propose an efficient compiler framework for cache bypassing on GPUs. Our objective is to efficiently utilize the configurable cache and improve the overall performance for general purpose GPU applications. In order to achieve this goal, we first characterize GPU cache utilization and develop performance metrics to estimate the cache reuses and memory traffic. Next, we present efficient algorithms that judiciously select global load instructions for cache access or bypass. Finally, we integrate our techniques into an automatic compiler framework that leverages PTX instruction set architecture. Experiments evaluation demonstrates that compared to cache-all and bypass-all solutions, our techniques can achieve considerable performance improvement.

Keywords :

cache storage; graphics processing units; program compilers; GPU cache utilization; PTX instruction set architecture; bypass-all solutions; cache access control; cache bypassing; cache performance optimization; cache-all solutions; compiler framework; configurable cache; general purpose applications; global load instructions; graphics processing units; memory access; on-chip memory; performance metrics; scratchpad memory; Computer architecture; Graphics processing units; Instruction sets; Instruments; Measurement; Optimization; System-on-chip; Cache Bypassing; Compiler Optimization; GPU;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer-Aided Design (ICCAD), 2013 IEEE/ACM International Conference on

Conference_Location :

San Jose, CA

ISSN :

1092-3152

Type :

conf

DOI :

10.1109/ICCAD.2013.6691165

Filename :

6691165

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=659045