DocumentCode
88629
Title
Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications
Author
Sartori, John ; Kumar, Ravindra
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Volume
15
Issue
2
fYear
2013
fDate
Feb. 2013
Firstpage
279
Lastpage
290
Abstract
Control and memory divergence between threads within the same execution bundle, or warp, have been shown to cause significant performance bottlenecks for GPU applications. In this paper, we exploit the observation that many GPU applications exhibit error tolerance to propose branch and data herding. Branch herding eliminates control divergence by forcing all threads in a warp to take the same control path. Data herding eliminates memory divergence by forcing each thread in a warp to load from the same memory block. To safely and efficiently support branch and data herding, we propose a static analysis and compiler framework to prevent exceptions when control and data errors are introduced, a profiling framework that aims to maximize performance while maintaining acceptable output quality, and hardware optimizations to improve the performance benefits of exploiting error tolerance through branch and data herding. Our software implementation of branch herding on NVIDIA GeForce GTX 480 improves performance by up to 34% (13%, on average) for a suite of NVIDIA CUDA SDK and Parboil benchmarks. Our hardware implementation of branch herding improves performance by up to 55% (30%, on average). Data herding improves performance by up to 32% (25%, on average). Observed output quality degradation is minimal for several applications that exhibit error tolerance, especially for visual computing applications.
Keywords
exception handling; fault tolerant computing; graphics processing units; parallel architectures; program compilers; program diagnostics; storage management; NVIDIA CUDA SDK; NVIDIA GeForce GTX 480; Parboil benchmark; branch herding; compiler framework; control divergence reduction; control path; data herding; error tolerance; error-tolerant GPU application; exception prevention; execution bundle; hardware optimization; memory block; memory divergence reduction; output quality degradation; performance bottleneck; profiling framework; software implementation; static analysis; visual computing application; warp threads; Computer architecture; Degradation; Graphics processing units; Hardware; Kernel; Message systems; Optimization; Error tolerance; energy efficiency;
fLanguage
English
Journal_Title
Multimedia, IEEE Transactions on
Publisher
ieee
ISSN
1520-9210
Type
jour
DOI
10.1109/TMM.2012.2232647
Filename
6376229
Link To Document