Title :
Mapping Streaming Applications onto GPU Systems
Author :
Huynh Phung Huynh ; Hagiescu, Andrei ; Ong Zhong Liang ; Weng-Fai Wong ; Goh, Rick Siow Mong
Author_Institution :
A*STAR Inst. of High Performance Comput., Singapore, Singapore
Abstract :
Graphics processing units leverage on a large array of parallel processing cores to boost the performance of the streaming computation patterns frequently found in graphics applications. Unfortunately, while many other general purpose applications also exhibit streaming behavior, they possess unfavorable data layout and poor computation-to-communication ratios that may penalize any straight-forward GPU implementation. In this paper we describe a performance-driven code generation framework that maps general purpose streaming applications onto GPU systems. This automated framework takes into account the idiosyncrasies of the GPU pipeline and the unique memory hierarchy. The framework has been implemented as a back-end to the StreamIt programming language compiler. Several key features in this framework ensure maximized performance and scalability. First, the generated code increases the effectiveness of the on-chip memory hierarchy by employing a heterogeneous mix of compute and memory access threads. Our scheme goes against the conventional wisdom of GPU programming which is to use a large number of homogeneous threads. Second, we utilise an efficient stream graph partitioning algorithm to handle larger applications and achieve the best performance under the given on-chip memory constraints. Lastly, the framework maps complex applications onto multiple GPUs using a highly effective parallel execution scheme. Our comprehensive experiments show its scalability and significant speedup compared to a previous state-of-the-art solution.
Keywords :
graphics processing units; pipeline processing; program compilers; GPU pipeline; GPU programming; GPU systems; StreamIt programming language compiler; general purpose streaming applications; graphics processing units; memory access threads; onchip memory hierarchy; performance-driven code generation framework; stream graph partitioning algorithm; streaming application mapping; Graphics processing units; Instruction sets; Layout; Memory management; Parallel processing; Schedules; Steady-state; GPU; multi-GPU; scalable; streamIt; streaming application;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2013.195