Title :
PORPLE: An Extensible Optimizer for Portable Data Placement on GPU
Author :
Guoyang Chen ; Bo Wu ; Dong Li ; Xipeng Shen
Author_Institution :
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
Abstract :
GPU is often equipped with complex memory systems, including globalmemory, texture memory, shared memory, constant memory, and variouslevels of cache. Where to place the data is important for theperformance of a GPU program. However, the decision is difficult for aprogrammer to make because of architecture complexity and thesensitivity of suitable data placements to input and architecturechanges.This paper presents PORPLE, a portable data placement engine thatenables a new way to solve the data placement problem. PORPLE consistsof a mini specification language, a source-to-source compiler, and a runtime data placer. The language allows an easy description of amemory system; the compiler transforms a GPU program into a formamenable to runtime profiling and data placement; the placer, based onthe memory description and data access patterns, identifies on the flyappropriate placement schemes for data and places themaccordingly. PORPLE is distinctive in being adaptive to program inputsand architecture changes, being transparent to programmers (in mostcases), and being extensible to new memory architectures. Ourexperiments on three types of GPU systems show that PORPLE is able toconsistently find optimal or near-optimal placement despite the largedifferences among GPU architectures and program inputs, yielding up to2.08X (1.59X on average) speedups on a set of regular and irregularGPU benchmarks.
Keywords :
data handling; graphics processing units; memory architecture; program compilers; specification languages; GPU program; PORPLE; architecture complexity; data access patterns; extensible optimizer; memory architectures; memory description; mini specification language; portable data placement; runtime data placer; runtime profiling; source-to-source compiler; Arrays; Engines; Graphics processing units; Instruction sets; Kernel; Runtime; cache; compiler; data placement; hardware specification language;
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
DOI :
10.1109/MICRO.2014.20