DocumentCode :
2582782
Title :
Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems
Author :
Ausavarungnirun, Rachata ; Chang, Kevin Kai-Wei ; Subramanian, Lavanya ; Loh, Gabriel H. ; Mutlu, Onur
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
2012
fDate :
9-13 June 2012
Firstpage :
416
Lastpage :
427
Abstract :
When multiple processor (CPU) cores and a GPU integrated together on the same chip share the off-chip main memory, requests from the GPU can heavily interfere with requests from the CPU cores, leading to low system performance and starvation of CPU cores. Unfortunately, state-of-the-art application-aware memory scheduling algorithms are ineffective at solving this problem at low complexity due to the large amount of GPU traffic. A large and costly request buffer is needed to provide these algorithms with enough visibility across the global request stream, requiring relatively complex hardware implementations. This paper proposes a fundamentally new approach that decouples the memory controller´s three primary tasks into three significantly simpler structures that together improve system performance and fairness, especially in integrated CPU-GPU systems. Our three-stage memory controller first groups requests based on row-buffer locality. This grouping allows the second stage to focus only on inter-application request scheduling. These two stages enforce high-level policies regarding performance and fairness, and therefore the last stage consists of simple per-bank FIFO queues (no further command reordering within each bank) and straightforward logic that deals only with low-level DRAM commands and timing. We evaluate the design trade-offs involved in our Staged Memory Scheduler (SMS) and compare it against three state-of-the-art memory controller designs. Our evaluations show that SMS improves CPU performance without degrading GPU frame rate beyond a generally acceptable level, while being significantly less complex to implement than previous application-aware schedulers. Furthermore, SMS can be configured by the system software to prioritize the CPU or the GPU at varying levels to address different performance needs.
Keywords :
DRAM chips; graphics processing units; multiprocessing systems; scheduling; storage management; CPU performance; GPU frame rate; GPU traffic; application-aware memory scheduling; heterogeneous systems; integrated CPU-GPU systems; interapplication request scheduling; low-level DRAM commands; memory controller designs; multiple processor cores; off-chip main memory; row-buffer locality; simple per-bank FIFO queues; staged memory scheduling; three-stage memory controller; Central Processing Unit; Graphics processing unit; Instruction sets; Random access memory; Scheduling algorithms; System performance; Timing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture (ISCA), 2012 39th Annual International Symposium on
Conference_Location :
Portland, OR
ISSN :
1063-6897
Print_ISBN :
978-1-4673-0475-7
Electronic_ISBN :
1063-6897
Type :
conf
DOI :
10.1109/ISCA.2012.6237036
Filename :
6237036
Link To Document :
بازگشت