• DocumentCode
    1920749
  • Title

    Accelerating Boosting-Based Face Detection on GPUs

  • Author

    Oro, David ; Fern´ndez, C. ; Segura, Carlos ; Martorell, Xavier ; Hernando, Javier

  • Author_Institution
    Herta Security, Barcelona, Spain
  • fYear
    2012
  • fDate
    10-13 Sept. 2012
  • Firstpage
    309
  • Lastpage
    318
  • Abstract
    The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level parallelism. Existing parallelization efforts strictly focus on mapping different divide and conquer strategies into multicore CPUs and GPUs. However, even the most advanced single-chip many-core processors to date are still struggling to effectively handle real-time face detection under high-definition video workloads. To address this challenge, face detection algorithms typically avoid computations by dynamically evaluating a boosted cascade of classifiers. Unfortunately, this technique yields a low ALU occupancy in architectures such as GPUs, which heavily rely on large SIMD widths for maximizing data-level parallelism. In this paper we present several techniques to increase the performance of the cascade evaluation kernel, which is the most resource-intensive part of the face detection pipeline. Particularly, the usage of concurrent kernel execution in combination with cascades generated with the Gentle Boost algorithm solves the problem of GPU underutilization, and achieves a 5X speedup in 1080p videos on average over the fastest known implementations, while slightly improving the accuracy. Finally, we also studied the parallelization of the cascade training process and its scalability under SMP platforms. The proposed parallelization strategy exploits both task and data-level parallelism and achieves a 3.5X speedup over single-threaded implementations.
  • Keywords
    face recognition; graphics processing units; high definition video; multiprocessing systems; parallel programming; training; video signal processing; 5X speedup; ALU; GPU; GPU underutilization; GentleBoost algorithm; SIMD; SMP platforms; accelerating boosting-based face detection; advanced single-chip many-core processors; boosted classifiers cascade; cascade evaluation kernel; cascade training process; concurrent kernel execution; conquer strategies; data-level parallelism; face detection pipeline; graphics workloads; high-definition video workloads; multicore CPU; parallelization strategy; real-time face detection; resource-intensive part; single-threaded implementations; Face; Face detection; Graphics processing unit; Instruction sets; Kernel; Parallel processing; Training; Face detection; GPU; parallel programming; video processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2012 41st International Conference on
  • Conference_Location
    Pittsburgh, PA
  • ISSN
    0190-3918
  • Print_ISBN
    978-1-4673-2508-0
  • Type

    conf

  • DOI
    10.1109/ICPP.2012.12
  • Filename
    6337592