مرکز منطقه ای اطلاع رساني علوم و فناوري - Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems

DocumentCode :

2582782

Title :

Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems

Author :

Ausavarungnirun, Rachata ; Chang, Kevin Kai-Wei ; Subramanian, Lavanya ; Loh, Gabriel H. ; Mutlu, Onur

Author_Institution :

Carnegie Mellon Univ., Pittsburgh, PA, USA

fYear :

2012

fDate :

9-13 June 2012

Firstpage :

416

Lastpage :

427

Abstract :

When multiple processor (CPU) cores and a GPU integrated together on the same chip share the off-chip main memory, requests from the GPU can heavily interfere with requests from the CPU cores, leading to low system performance and starvation of CPU cores. Unfortunately, state-of-the-art application-aware memory scheduling algorithms are ineffective at solving this problem at low complexity due to the large amount of GPU traffic. A large and costly request buffer is needed to provide these algorithms with enough visibility across the global request stream, requiring relatively complex hardware implementations. This paper proposes a fundamentally new approach that decouples the memory controller´s three primary tasks into three significantly simpler structures that together improve system performance and fairness, especially in integrated CPU-GPU systems. Our three-stage memory controller first groups requests based on row-buffer locality. This grouping allows the second stage to focus only on inter-application request scheduling. These two stages enforce high-level policies regarding performance and fairness, and therefore the last stage consists of simple per-bank FIFO queues (no further command reordering within each bank) and straightforward logic that deals only with low-level DRAM commands and timing. We evaluate the design trade-offs involved in our Staged Memory Scheduler (SMS) and compare it against three state-of-the-art memory controller designs. Our evaluations show that SMS improves CPU performance without degrading GPU frame rate beyond a generally acceptable level, while being significantly less complex to implement than previous application-aware schedulers. Furthermore, SMS can be configured by the system software to prioritize the CPU or the GPU at varying levels to address different performance needs.

Keywords :

DRAM chips; graphics processing units; multiprocessing systems; scheduling; storage management; CPU performance; GPU frame rate; GPU traffic; application-aware memory scheduling; heterogeneous systems; integrated CPU-GPU systems; interapplication request scheduling; low-level DRAM commands; memory controller designs; multiple processor cores; off-chip main memory; row-buffer locality; simple per-bank FIFO queues; staged memory scheduling; three-stage memory controller; Central Processing Unit; Graphics processing unit; Instruction sets; Random access memory; Scheduling algorithms; System performance; Timing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Architecture (ISCA), 2012 39th Annual International Symposium on

Conference_Location :

Portland, OR

ISSN :

1063-6897

Print_ISBN :

978-1-4673-0475-7

Electronic_ISBN :

1063-6897

Type :

conf

DOI :

10.1109/ISCA.2012.6237036

Filename :

6237036

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2582782