DocumentCode :
169103
Title :
Hardware/Software Helper Thread Prefetching on Heterogeneous Many Cores
Author :
Swamy, Bharath N. ; Ketterlin, Alain ; Seznec, Andre
Author_Institution :
IRISA/INRIA, Rennes, France
fYear :
2014
fDate :
22-24 Oct. 2014
Firstpage :
214
Lastpage :
221
Abstract :
Heterogeneous Many Cores (HMC) architectures that mix many simple/small cores with a few complex/large cores are emerging as a design alternative that can provide both fast sequential performance for single threaded workloads and power-efficient execution for through-put oriented parallel workloads. The availability of many small cores in a HMC presents an opportunity to utilize them as low-power helper cores to accelerate memory-intensive sequential programs mapped to a large core. However, the latency overhead of accessing small cores in a loosely coupled system limits their utility as helper cores. Also, it is not clear if small cores can execute helper threads sufficiently in advance to benefit applications running on a larger, much powerful, core. In this paper, we present a hardware/software framework called coretethering to support efficient helper threading on heterogeneous manycores. Core-tethering provides a co-processor like interface to the small cores that (a) enables a large core to directly initiate and control helper execution on the helper core and (b) allows efficient transfer of execution context between the cores, thereby reducing the performance overhead of accessing small cores for helper execution. Our evaluation on a set of memory intensive programs chosen from the standard benchmark suites show that, helper threads using moderately sized small cores can significantly accelerate a larger core compared to using a hardware prefacer alone. We find that a small core provides a good trade-off against using an equivalent large core to run helper threads in a HMC. Additionally, helper prefetching on small cores when used along with hardware prefetching, can provide an alternate design point to growing instruction window size for achieving higher sequential performance on memory intensive applications.
Keywords :
benchmark testing; multi-threading; multiprocessing systems; parallel processing; performance evaluation; storage management; HMC architectures; core-tethering; hardware-software helper thread prefetching; heterogeneous many core architectures; instruction window size; latency overhead; loosely coupled system; low-power helper cores; memory intensive applications; memory intensive programs; memory-intensive sequential programs; performance overhead; power-efficient execution; single threaded workloads; standard benchmark suites; through-put oriented parallel workloads; Hardware; Multicore processing; Prefetching; Registers; Synchronization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on
Conference_Location :
Jussieu
ISSN :
1550-6533
Type :
conf
DOI :
10.1109/SBAC-PAD.2014.39
Filename :
6970667
Link To Document :
بازگشت