DocumentCode
1920749
Title
Accelerating Boosting-Based Face Detection on GPUs
Author
Oro, David ; Fern´ndez, C. ; Segura, Carlos ; Martorell, Xavier ; Hernando, Javier
Author_Institution
Herta Security, Barcelona, Spain
fYear
2012
fDate
10-13 Sept. 2012
Firstpage
309
Lastpage
318
Abstract
The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level parallelism. Existing parallelization efforts strictly focus on mapping different divide and conquer strategies into multicore CPUs and GPUs. However, even the most advanced single-chip many-core processors to date are still struggling to effectively handle real-time face detection under high-definition video workloads. To address this challenge, face detection algorithms typically avoid computations by dynamically evaluating a boosted cascade of classifiers. Unfortunately, this technique yields a low ALU occupancy in architectures such as GPUs, which heavily rely on large SIMD widths for maximizing data-level parallelism. In this paper we present several techniques to increase the performance of the cascade evaluation kernel, which is the most resource-intensive part of the face detection pipeline. Particularly, the usage of concurrent kernel execution in combination with cascades generated with the Gentle Boost algorithm solves the problem of GPU underutilization, and achieves a 5X speedup in 1080p videos on average over the fastest known implementations, while slightly improving the accuracy. Finally, we also studied the parallelization of the cascade training process and its scalability under SMP platforms. The proposed parallelization strategy exploits both task and data-level parallelism and achieves a 3.5X speedup over single-threaded implementations.
Keywords
face recognition; graphics processing units; high definition video; multiprocessing systems; parallel programming; training; video signal processing; 5X speedup; ALU; GPU; GPU underutilization; GentleBoost algorithm; SIMD; SMP platforms; accelerating boosting-based face detection; advanced single-chip many-core processors; boosted classifiers cascade; cascade evaluation kernel; cascade training process; concurrent kernel execution; conquer strategies; data-level parallelism; face detection pipeline; graphics workloads; high-definition video workloads; multicore CPU; parallelization strategy; real-time face detection; resource-intensive part; single-threaded implementations; Face; Face detection; Graphics processing unit; Instruction sets; Kernel; Parallel processing; Training; Face detection; GPU; parallel programming; video processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2012 41st International Conference on
Conference_Location
Pittsburgh, PA
ISSN
0190-3918
Print_ISBN
978-1-4673-2508-0
Type
conf
DOI
10.1109/ICPP.2012.12
Filename
6337592
Link To Document