DocumentCode :
2957973
Title :
Robust SIMD: Dynamically Adapted SIMD Width and Multi-Threading Depth
Author :
Meng, Jiayuan ; Sheaffer, Jeremy W. ; Skadron, Kevin
Author_Institution :
Leadership Comput. Facility, Argonne Nat. Lab., Argonne, IL, USA
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
107
Lastpage :
118
Abstract :
Architectures that aggressively exploit SIMD often have many data paths execute in lockstep and use multi-threading to hide latency. They can yield high through-put in terms of area- and energy-efficiency for many data-parallel applications. To balance productivity and performance, many recent SIMD organizations incorporate implicit cache hierarchies. Examples of such architectures include Intel´s MIC, AMD´s Fusion, and NVIDIA´s Fermi. However, unlike software-managed streaming memories used in conventional graphics processors (GPUs), hardware-managed caches are more disruptive to SIMD execution, therefore the interaction between implicit caching and aggressive SIMD execution may no longer follow the conventional wisdom gained from streaming memories. We show that due to more frequent memory latency divergence, lower latency in non-L1 data accesses, and relatively unpredictable L1 contention, cache hierarchies favor different SIMD widths and multi-threading depths than streaming memories. In fact, because the above effects are subject to runtime dynamics, a fixed combination of SIMD width and multi-threading depth no longer works ubiquitously across diverse applications or when cache capacities are reduced due to pollution or power saving. To address the above issues and reduce design risks, this paper proposes Robust SIMD, which provides wide SIMD and then dynamically adjusts SIMD width and multi-threading depth according to performance feedback. Robust SIMD can trade wider SIMD for deeper multi-threading by splitting a wider SIMD group into multiple narrower SIMD groups. Compared to the performance generated by running every benchmark on its individually preferred SIMD organization, the same Robust SIMD organization performs similarly -- sometimes even better due to phase adaptation -- and out per-forms the best fixed SIMD organization by 17%. When D-cache capacity is reduced due to runtime disruptiveness, Robust SIMD offers graceful performance degradation, w- th 25% polluted cache lines in a 32 KB D-cache, Robust SIMD performs 1.4× better compared to a conventional SIMD architecture.
Keywords :
cache storage; multi-threading; storage management; D-cache capacity; GPU; SIMD organization; SIMD width; aggressive SIMD execution; data-parallel applications; energy efficiency; graphics processors; hardware-managed caches; implicit cache hierarchies; implicit caching; multithreading depth; phase adaptation; robust SIMD; runtime disruptiveness; software-managed streaming memories; Hardware; Instruction sets; Message systems; Organizations; Robustness; Runtime; Vectors; Adaptive Architecture; Divergence; SIMD;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-4673-0975-2
Type :
conf
DOI :
10.1109/IPDPS.2012.20
Filename :
6267828
Link To Document :
بازگشت