مرکز منطقه ای اطلاع رساني علوم و فناوري - Resource-Aware Compiler Prefetching for Many-Cores

DocumentCode :

3304092

Title :

Resource-Aware Compiler Prefetching for Many-Cores

Author :

Caragea, George C. ; Tzannes, Alexandros ; Keceli, Fuat ; Barua, Rajeev ; Vishkin, Uzi

Author_Institution :

Univ. of Maryland, College Park, MD, USA

fYear :

2010

fDate :

7-9 July 2010

Firstpage :

133

Lastpage :

140

Abstract :

Super-scalar, out-of-order processors that can have tens of read and write requests in the execution window place significant demands on Memory Level Parallelism (MLP). Multi-and many-cores with shared parallel caches further increase MLP demand. Current cache hierarchies however have been unable to keep up with this trend, with modern designs allowing only 4-16 concurrent cache misses. This disconnect is exacerbated by recent highly parallel architectures (e.g. GPUs) where power and area per-core budget favor lighter cores with less resources. Support for hardware and software prefetch increase MLP pressure since these techniques overlap multiple memory requests with existing computation. In this paper, we propose and evaluate a novel Resource-Aware Prefetching (RAP) compiler algorithm that is aware of the number of simultaneous prefetches supported, and optimized for the same. We show that in situations where not enough resources are available to issue prefetch instructions for all references in a loop, it is more beneficial to decrease the prefetch distance and prefetch for as many references as possible, rather than use a fixed prefetched distance and skip prefetching for some references, as in current approaches. We implemented our algorithm in a GCC-derived compiler and evaluated its performance using an emerging fine-grained many-core architecture. Our results show that the RAP algorithm outperforms a well-known loop prefetching algorithm by up to 40.15% and the state-of-the art GCC implementation by up to 34.79%. Moreover, we compare the RAP algorithm with a simple hardware prefetching mechanism, and show improvements of up to 24.61%.

Keywords :

multiprocessing systems; optimising compilers; parallel architectures; parallel memories; storage management; GCC-derived compiler; Multicore processor; fine-grained many-core architecture; hardware-software prefetch; loop prefetching algorithm; memory level parallelism; parallel architectures; resource aware compiler prefetching; shared parallel caches; super-scalar out-of-order processors; Distributed computing; Hardware; Multicore processing; Optimizing compilers; Out of order; Parallel architectures; Parallel processing; Prefetching; Read-write memory; optimizing compilers; parallel architectures;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Computing (ISPDC), 2010 Ninth International Symposium on

Conference_Location :

Istanbul

Print_ISBN :

978-1-4244-7602-2

Type :

conf

DOI :

10.1109/ISPDC.2010.16

Filename :

5532508

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3304092