مرکز منطقه ای اطلاع رساني علوم و فناوري - MRPB: Memory request prioritization for massively parallel processors

DocumentCode :

157778

Title :

MRPB: Memory request prioritization for massively parallel processors

Author :

Wenhao Jia ; Shaw, Kelly A. ; Martonosi, Margaret

Author_Institution :

Princeton Univ., Princeton, NJ, USA

fYear :

2014

fDate :

15-19 Feb. 2014

Firstpage :

272

Lastpage :

283

Abstract :

Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. Unfortunately, GPU caches often have mixed or unpredictable performance impact due to cache contention that results from the high thread counts in GPUs. We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods-request reordering and cache bypassing-to memory requests before they access a cache. MRPB then releases requests into the cache in a more cache-friendly order. The result is drastically reduced cache contention and improved use of the limited per-thread cache capacity. For a simulated 16KB L1 cache, MRPB improves the average performance of the entire PolyBench and Rodinia suites by 2.65× and 1.27× respectively, outperforming a state-of-the-art GPU cache management technique.

Keywords :

cache storage; graphics processing units; parallel processing; performance evaluation; GPU cache management technique; GPU caches; GPU performance; GPU programming; MRPB; PolyBench suites; Rodinia suites; address spaces; cache bypassing; caching efficiency; graphics processing units; hardware structure; limited per-thread cache capacity; massively parallel processors; massively parallel throughput-oriented systems; memory access latency; memory access traffic; memory hierarchies; memory request prioritization; memory request prioritization buffer; prioritization methods; request reordering; simulated L1 cache; thread counts; Graphics processing units; Hardware; Instruction sets; Kernel; Pipelines; Throughput;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on

Conference_Location :

Orlando, FL

Type :

conf

DOI :

10.1109/HPCA.2014.6835938

Filename :

6835938

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=157778