Title :
A Compiler-assisted Runtime-prefetching Scheme for Heterogenous Platforms
Author :
Shou, Baojiang ; Hou, Xionghui ; Chen, Li
Author_Institution :
ICT, Beijing, China
Abstract :
GPGPU has been widely adopted by industry and academia. For real applications on industry, however, the data communications between CPUs and GPUs often dramatically slow down the overall performance. Another difficulty raised by GPGPU is the programming productivity. OpenMP is a high-level programming model widely accepted by industry. A software distributed shared memory system (DSM) is implemented to provide a logic shared memory space and to manage data communications between CPUs and GPUs. The DSM is block-based, and the block size is adjustable based on loop partitioning parameters. In this work, we optimize the DSM system using a compiler-assisted data-prefetching scheme. There is a prefetching thread and a prefetching worker for each sepa rated memory. The prefetching thread looks into the future, applies inter-thread use-def analysis to judge which part of the USE region has already been generated by computing threads and produces prefetching requests. The prefetching worker carries out the prefetching operations.
Keywords :
distributed shared memory systems; graphics processing units; multiprocessing systems; parallel architectures; program compilers; program control structures; storage management; CPU; CUDA code; DSM; GPU; OpenMP; Pthreads; USE region; compiler assisted runtime prefetching scheme; data communications manage; heterogenous platforms; high level programming model; interthread use-def analysis; logic shared memory space; loop partitioning parameters; prefetching thread; prefetching worker; software distributed shared memory system; Data communication; Industries; Kernel; Prefetching; Programming; Runtime;
Conference_Titel :
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
Conference_Location :
Galveston, TX
Print_ISBN :
978-1-4577-1794-9
DOI :
10.1109/PACT.2011.48