مرکز منطقه ای اطلاع رساني علوم و فناوري - Power-performance co-optimization of throughput core architecture using resistive memory

DocumentCode :

602614

Title :

Power-performance co-optimization of throughput core architecture using resistive memory

Author :

Goswami, Nilanjan ; Bingyi Cao ; Tao Li

Author_Institution :

Dept. of Electr. & Comput. Eng., Univ. of Florida, Gainesville, FL, USA

fYear :

2013

fDate :

23-27 Feb. 2013

Firstpage :

342

Lastpage :

353

Abstract :

Massively parallel computing on throughput computers such as GPUs requires myriad memory accesses to register files, on-chip scratchpad, caches, and off-chip DRAM. Unlike CPUs, these processors have a large register file and on-chip scratchpad memory, which consume a significant portion of compute core power (35%-45%). In this paper, we introduce novel throughput architecture by integrating resistive memory (Spin Transfer Torque RAM) inside the compute core, which reduces leakage significantly, but introduces write power overhead and longer write latencies in GPU shared memory and register file accesses. We enhance the compute core by introducing register file organization with differential memory update mechanism to remove update redundancy during write operations. Furthermore, using merged register-write-mechanism and write-back buffer, we coalesce multithreaded GPU register write accesses to save write energy. In addition, we introduce hybrid shared memory design using SRAM and STT-MRAM that provides significant leakage/dynamic power savings without affecting performance. On average, across 23 GPGPU/graphics workloads, our schemes save 46% dynamic power due to register access (83% leakage power saving) with negligible performance degradation. On average, hybrid shared memory provides 10% reduction in dynamic power with maximum 1.6× performance improvement for the current workloads at no additional area overhead.

Keywords :

DRAM chips; SRAM chips; cache storage; graphics processing units; multi-threading; optimising compilers; parallel memories; performance evaluation; power aware computing; shared memory systems; GPGPU-graphics workloads; GPU shared memory; SRAM; STT-MRAM; caches; differential memory update mechanism; dynamic power; hybrid shared memory design; leakage-dynamic power savings; multithreaded GPU register write accesses; myriad memory accesses; off-chip DRAM; on-chip scratchpad memory; parallel computing; performance degradation; performance improvement; power-performance cooptimization; register file accesses; register file organization; register-write-mechanism; resistive memory; spin transfer torque RAM; throughput architecture; throughput computers; throughput core architecture; write latencies; write power overhead; write-back buffer; Arrays; Graphics processing units; Magnetic tunneling; Random access memory; Registers; System-on-chip;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on

Conference_Location :

Shenzhen

ISSN :

1530-0897

Print_ISBN :

978-1-4673-5585-8

Type :

conf

DOI :

10.1109/HPCA.2013.6522331

Filename :

6522331

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=602614