Title :
Mobile GPU shader processor based on non-blocking Coarse Grained Reconfigurable Arrays architecture
Author :
Kwontaek Kwon ; Sungjin Son ; Jeongsoo Park ; Jeongae Park ; Sangoak Woo ; Seokyoon Jung ; Soojung Ryu
Author_Institution :
Samsung Adv. Inst. of Technol., Samsung Electron., Yongin, South Korea
Abstract :
Coarse-grained reconfigurable arrays (CGRAs) based processors provide high performance and energy-efficiency as well as programmability by means of the ability to reconfigure the datapath connecting the ALU arrays. A CGRA based processor executes loop kernels whose schedule should be fixed at compile time. This restriction hinders CGRA from being efficient particularly in accessing external memories or caches whose access time varies greatly. This makes it challenging to build a CGRA based high-performance, energy-efficient mobile GPU because GPU shader execution usually involves massive texture memory accesses which consist of accesses to texture cache and external texture memory. In this paper, we present an Non-blocking Coarse Grained Reconfigurable Arrays (NBC-GRA) architecture which can handle varying-latency operations efficiently. We also propose an improved CGRA based GPU shader processor architecture based on it. Retry buffer enables threads to re-execute later when the required memory access completes. With a non-blocking texture cache, the shader core can execute without stalls even in the case of cache misses. All of these components help to improve CGRA core throughput greatly despite of longer memory access latencies. Evaluation results show that our NBCGRA architecture based shader processor could perform efficiently despite extreme variation of texture cache access latencies and could reduce the shader execution cycles by upto 68% with minimal hardware cost overhead.
Keywords :
cache storage; graphics processing units; reconfigurable architectures; ALU arrays; CGRA core throughput; NBC-GRA; datapath reconfiguration; external texture memory; graphics processing units; hardware cost overhead; loop kernels; mobile GPU shader processor; nonblocking coarse grained reconfigurable arrays architecture; retry buffer; texture cache access latency; texture memory access; Computer architecture; Graphics processing units; Instruction sets; Kernel; Pipelines; Registers; Schedules;
Conference_Titel :
Field-Programmable Technology (FPT), 2013 International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4799-2199-7
DOI :
10.1109/FPT.2013.6718353