Title :
Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores
Author :
Hijaz, Farrukh ; Qingchuan Shi ; Khan, Omar
Author_Institution :
Univ. of Connecticut, Storrs, CT, USA
Abstract :
Near-threshold voltage operation is widely acknowledged as a potential mechanism to achieve an order of magnitude reduction in energy consumption in future processors. However, processors cannot operate reliably below a minimum voltage, Vccmin, since hardware components may fail. SRAM bitcell failures in memory structures, such as caches, typically determine the Vccmin for a processor. Although the last-level shared caches (LLC) in modern multicores are protected using error correcting codes (ECC), the private caches have been left unprotected due to their performance sensitivity to the latency overhead of the ECC. This limits the operation of the processor at near-threshold voltages.In this paper, we propose mechanisms for near-threshold operation of private caches that do not require ECC support. First, we present a fine-grain mechanism to disable cache lines in private caches, with bitcell failures at the target near-threshold voltage. Second, we propose two mechanisms to better manage the capacity-stressed private caches. (1) We utilize the OS-level data classification of private and shared data and evaluate a data placement mechanism that dynamically relocates the private data blocks to the LLC slice that is physically co-located with the requesting core. (2) We propose an in-hardware yet low-overhead runtime profiling of the locality of each cache line that is classified as private data, and only allow such data to be cached in the private caches if it shows high spatio-temporal locality. These mechanisms allow the private caches to rely on the local LLC slice to cache the low-locality private data efficiently, and enable more space to hold the more frequently used private data (as well as the shared data). We show that combining cache line disabling with efficient cache management of private data performs better (in terms of application completion times) than using a single error correction double error detection (SECDED) based ECC mechanism and/or cache lin- disabling.
Keywords :
SRAM chips; cache storage; data privacy; energy consumption; error correction codes; shared memory systems; spatiotemporal phenomena; storage management; ECC latency overhead; LLC slice; OS-level data classification; SECDED; SRAM bitcell failures; cache line disabling; capacity-stressed private caches; data placement mechanism evaluation; energy consumption; error correcting codes; fine-grain mechanism; future processors; in-hardware runtime; last-level shared caches; low-latency mechanisms; low-locality private data; low-overhead runtime; memory structures; near-threshold voltages; performance sensitivity; private cache near-threshold operation; private data cache management; shared data; shared memory multicores; single error correction double error detection; spatio-temporal locality; Benchmark testing; Error correction codes; Multicore processing; Program processors; Protocols; Radiation detectors; Random access memory;
Conference_Titel :
Microarchitecture Workshops (MICROW), 2012 45th Annual IEEE/ACM International Symposium on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4673-4920-8
DOI :
10.1109/MICROW.2012.10