DocumentCode :
3304092
Title :
Resource-Aware Compiler Prefetching for Many-Cores
Author :
Caragea, George C. ; Tzannes, Alexandros ; Keceli, Fuat ; Barua, Rajeev ; Vishkin, Uzi
Author_Institution :
Univ. of Maryland, College Park, MD, USA
fYear :
2010
fDate :
7-9 July 2010
Firstpage :
133
Lastpage :
140
Abstract :
Super-scalar, out-of-order processors that can have tens of read and write requests in the execution window place significant demands on Memory Level Parallelism (MLP). Multi-and many-cores with shared parallel caches further increase MLP demand. Current cache hierarchies however have been unable to keep up with this trend, with modern designs allowing only 4-16 concurrent cache misses. This disconnect is exacerbated by recent highly parallel architectures (e.g. GPUs) where power and area per-core budget favor lighter cores with less resources. Support for hardware and software prefetch increase MLP pressure since these techniques overlap multiple memory requests with existing computation. In this paper, we propose and evaluate a novel Resource-Aware Prefetching (RAP) compiler algorithm that is aware of the number of simultaneous prefetches supported, and optimized for the same. We show that in situations where not enough resources are available to issue prefetch instructions for all references in a loop, it is more beneficial to decrease the prefetch distance and prefetch for as many references as possible, rather than use a fixed prefetched distance and skip prefetching for some references, as in current approaches. We implemented our algorithm in a GCC-derived compiler and evaluated its performance using an emerging fine-grained many-core architecture. Our results show that the RAP algorithm outperforms a well-known loop prefetching algorithm by up to 40.15% and the state-of-the art GCC implementation by up to 34.79%. Moreover, we compare the RAP algorithm with a simple hardware prefetching mechanism, and show improvements of up to 24.61%.
Keywords :
multiprocessing systems; optimising compilers; parallel architectures; parallel memories; storage management; GCC-derived compiler; Multicore processor; fine-grained many-core architecture; hardware-software prefetch; loop prefetching algorithm; memory level parallelism; parallel architectures; resource aware compiler prefetching; shared parallel caches; super-scalar out-of-order processors; Distributed computing; Hardware; Multicore processing; Optimizing compilers; Out of order; Parallel architectures; Parallel processing; Prefetching; Read-write memory; optimizing compilers; parallel architectures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing (ISPDC), 2010 Ninth International Symposium on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7602-2
Type :
conf
DOI :
10.1109/ISPDC.2010.16
Filename :
5532508
Link To Document :
بازگشت