• DocumentCode
    2014565
  • Title

    TCU: Thread compaction unit for GPGPU applications on mobile graphics hardware

  • Author

    Yu-Jung Chen ; Pai-Shun Ting ; Meng-Lin Yu ; Chia-Ming Chang ; Shao-Yi Chien

  • Author_Institution
    Grad. Inst. of Electron. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2012
  • fDate
    17-19 Sept. 2012
  • Firstpage
    146
  • Lastpage
    151
  • Abstract
    Thread divergence frequently occurs in various GPGPU applications while parallel threads encounter branching, especially for multimedia algorithms involving high-level image processing and computer vision algorithms. Either predicate or branch instruction degrades the parallel processing performance as a result of synchronization cost. In this work, we propose a configurable thread compaction unit (TCU) to relieve such execution overhead. Through early-stage compacting divergent threads by evaluating a compacting function with a compacting map, TCU can prevent redundant executions caused by predicate instruction, or repetitively invoking processors to fetch instructions and validate effective branches. Our simulation results show that, with TCU, GPUs can improve up to 24.5x and 1.8x performance for Viola-Jones face detection framework compared to predicate and branch instruction. Furthermore, 4.3x and 1.4x improvement in salient region linear feature extraction can be achieved as well. Finally, cache issue in TCU architecture is also discussed in detail.
  • Keywords
    computer vision; face recognition; feature extraction; graphics processing units; instruction sets; mobile computing; multi-threading; program compilers; GPGPU application; TCU architecture; Viola-Jones face detection framework; branch instruction; branching; compacting function; compacting map; computer vision algorithm; configurable thread compaction unit; execution overhead; high-level image processing; instruction fetching; mobile graphics hardware; multimedia algorithm; parallel processing performance; parallel threads; predicate instruction; redundant execution; salient region linear feature extraction; synchronization cost; thread divergence; Compaction; Face; Face detection; Feature extraction; Graphics processing units; Instruction sets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Signal Processing (MMSP), 2012 IEEE 14th International Workshop on
  • Conference_Location
    Banff, AB
  • Print_ISBN
    978-1-4673-4570-5
  • Electronic_ISBN
    978-1-4673-4571-2
  • Type

    conf

  • DOI
    10.1109/MMSP.2012.6343431
  • Filename
    6343431