DocumentCode :
2014565
Title :
TCU: Thread compaction unit for GPGPU applications on mobile graphics hardware
Author :
Yu-Jung Chen ; Pai-Shun Ting ; Meng-Lin Yu ; Chia-Ming Chang ; Shao-Yi Chien
Author_Institution :
Grad. Inst. of Electron. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear :
2012
fDate :
17-19 Sept. 2012
Firstpage :
146
Lastpage :
151
Abstract :
Thread divergence frequently occurs in various GPGPU applications while parallel threads encounter branching, especially for multimedia algorithms involving high-level image processing and computer vision algorithms. Either predicate or branch instruction degrades the parallel processing performance as a result of synchronization cost. In this work, we propose a configurable thread compaction unit (TCU) to relieve such execution overhead. Through early-stage compacting divergent threads by evaluating a compacting function with a compacting map, TCU can prevent redundant executions caused by predicate instruction, or repetitively invoking processors to fetch instructions and validate effective branches. Our simulation results show that, with TCU, GPUs can improve up to 24.5x and 1.8x performance for Viola-Jones face detection framework compared to predicate and branch instruction. Furthermore, 4.3x and 1.4x improvement in salient region linear feature extraction can be achieved as well. Finally, cache issue in TCU architecture is also discussed in detail.
Keywords :
computer vision; face recognition; feature extraction; graphics processing units; instruction sets; mobile computing; multi-threading; program compilers; GPGPU application; TCU architecture; Viola-Jones face detection framework; branch instruction; branching; compacting function; compacting map; computer vision algorithm; configurable thread compaction unit; execution overhead; high-level image processing; instruction fetching; mobile graphics hardware; multimedia algorithm; parallel processing performance; parallel threads; predicate instruction; redundant execution; salient region linear feature extraction; synchronization cost; thread divergence; Compaction; Face; Face detection; Feature extraction; Graphics processing units; Instruction sets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on
Conference_Location :
Banff, AB
Print_ISBN :
978-1-4673-4570-5
Electronic_ISBN :
978-1-4673-4571-2
Type :
conf
DOI :
10.1109/MMSP.2012.6343431
Filename :
6343431
Link To Document :
بازگشت