Enabling Portable Optimizations of Data Placement on GPU

Author

Guoyang Chen ; Bo Wu ; Dong Li ; Xipeng Shen

Volume

35

Issue

4

fYear

2015

fDate

July-Aug. 2015

Firstpage

16

Lastpage

24

Abstract

Modern GPU memory systems manifest more varieties, increasing complexities, and rapid changes. Different placements of data on memory systems often cause significant differences in program performance. Most current GPU programming systems rely on programmers to indicate the appropriate placements, but finding the appropriate placements is difficult for programmers in practice owing to the complexity and fast changes of memory systems, as well as the input sensitivity of appropriate data placements--that is, the best placements often differ when a program runs on a different input data set. This article introduces a software framework called Porple. It offers a solution that, for the first time, makes it possible to automatically enhance data placement across a GPU. Through Porple, a GPU program´s data gets placed appropriately on memory on the fly, customized to the current input dataset. Moreover, when new memory systems arrive, it can easily adapt the placements accordingly. Experiments on three types of GPU systems show that Porple consistently finds optimal or near-optimal placement, yielding up to 2.93 times (1.75 times average on three generations of GPU) speedups compared to programmers´ decisions.

Keywords

computational complexity; data handling; graphics processing units; storage management; GPU programming system; Porple software framework; complexity; modern GPU memory systems; near-optimal placement; portable data placement optimization; program performance; programmer decision; Benchmark testing; Complexity theory; Computer programs; Graphics processing units; Memory; Runtime; GPU; cache; compiler; data placement; hardware specification language;

fLanguage

English

Journal_Title

Micro, IEEE

Publisher

ieee

ISSN

0272-1732

Type

jour

DOI

10.1109/MM.2015.53

Filename

7106396