Title :
Hybrid latency tolerance for robust energy-efficiency on 1000-core data parallel processors
Author :
Crago, N.C. ; Azizi, O. ; Lumetta, S.S. ; Patel, Swati J.
Abstract :
Currently, GPUs and data parallel processors leverage latency tolerance techniques such as multithreading and prefetching to maximize performance per Watt. However, choosing a technique that provides energy-efficiency on a wide variety of workloads is difficult, as the type of latency to tolerate, required hardware complexity, and energy consumption is directly related to application behavior. After qualitatively evaluating five commonly used latency tolerance techniques, we develop a hybrid technique utilizing multithreading and decoupled execution to maximize performance while minimizing hardware complexity and energy consumption across a wide variety of workloads. We compare our hybrid technique with the five commonly used techniques on a 1024-core data parallel processor by performing a comprehensive design space exploration, leveraging detailed performance and physical design models. By intelligently leveraging both decoupled execution and multithreading, our hybrid latency tolerance technique is able to improve energy-efficiency by 28% to 89% over any single technique on data parallel benchmarks. Compared to other combinations of latency tolerance techniques, we find that our hybrid latency tolerance technique provides the highest energy-efficiency by over 26%.
Keywords :
fault tolerant computing; microprocessor chips; multi-threading; power aware computing; 1000 core data parallel processors; GPU; decoupled execution; hybrid latency tolerance; multithreading; parallel processor; robust energy efficiency; Energy consumption; Hardware; Multithreading; Out of order; Prefetching;
Conference_Titel :
High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4673-5585-8
DOI :
10.1109/HPCA.2013.6522327