TCU: Thread compaction unit for GPGPU applications on mobile graphics hardware

Author

Yu-Jung Chen ; Pai-Shun Ting ; Meng-Lin Yu ; Chia-Ming Chang ; Shao-Yi Chien

Author_Institution

Grad. Inst. of Electron. Eng., Nat. Taiwan Univ., Taipei, Taiwan

fYear

2012

fDate

17-19 Sept. 2012

Firstpage

146

Lastpage

151

Abstract

Thread divergence frequently occurs in various GPGPU applications while parallel threads encounter branching, especially for multimedia algorithms involving high-level image processing and computer vision algorithms. Either predicate or branch instruction degrades the parallel processing performance as a result of synchronization cost. In this work, we propose a configurable thread compaction unit (TCU) to relieve such execution overhead. Through early-stage compacting divergent threads by evaluating a compacting function with a compacting map, TCU can prevent redundant executions caused by predicate instruction, or repetitively invoking processors to fetch instructions and validate effective branches. Our simulation results show that, with TCU, GPUs can improve up to 24.5x and 1.8x performance for Viola-Jones face detection framework compared to predicate and branch instruction. Furthermore, 4.3x and 1.4x improvement in salient region linear feature extraction can be achieved as well. Finally, cache issue in TCU architecture is also discussed in detail.

Keywords

computer vision; face recognition; feature extraction; graphics processing units; instruction sets; mobile computing; multi-threading; program compilers; GPGPU application; TCU architecture; Viola-Jones face detection framework; branch instruction; branching; compacting function; compacting map; computer vision algorithm; configurable thread compaction unit; execution overhead; high-level image processing; instruction fetching; mobile graphics hardware; multimedia algorithm; parallel processing performance; parallel threads; predicate instruction; redundant execution; salient region linear feature extraction; synchronization cost; thread divergence; Compaction; Face; Face detection; Feature extraction; Graphics processing units; Instruction sets;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on

Conference_Location

Banff, AB

Print_ISBN

978-1-4673-4570-5

Electronic_ISBN

978-1-4673-4571-2

Type

conf

DOI

10.1109/MMSP.2012.6343431

Filename

6343431