DocumentCode :
3198441
Title :
LipActs: Efficient representations for visual speakers
Author :
Zavesky, Eric
Author_Institution :
AT&T Labs. Res., Middletown, NJ, USA
fYear :
2011
fDate :
11-15 July 2011
Firstpage :
1
Lastpage :
4
Abstract :
Video-based lip activity analysis has been successfully used for assisting speech recognition for almost a decade. Surprisingly, this same capability has not been heavily used for near real-time visual speaker retrieval and verification, due to tracking complexity, inadequate or difficult feature determination, and the need for a large amount of pre-labeled data for model training. This paper explores the performance of several solutions using modern histogram of oriented gradients (HOG) features, several quantization techniques, and analyzes the benefits of temporal sampling and spatial partitioning to derive a representation called LipActs. Two datasets are used for evaluation: one with 81 participants derived from varying quality YouTube content and one with 3 participants derived from a forward facing mobile video camera with 10 varied lighting and capture angle environments. Over these datasets, LipActs with a moderate number of pooled temporal frames and multi-resolution spatial quantization, offer an improvement of 37-73% over raw features when optimizing for lowest equal error rate (EER).
Keywords :
computational complexity; speech recognition; video signal processing; EER; HOG; LipActs; YouTube content; equal error rate; histogram of oriented gradients; mobile video camera; quantization techniques; spatial partitioning; speech recognition; temporal sampling; tracking complexity; video based lip activity analysis; visual speaker retrieval; Detectors; Face; Feature extraction; Histograms; Quantization; Visualization; Vocabulary; feature extraction; learning systems; verification; video analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo (ICME), 2011 IEEE International Conference on
Conference_Location :
Barcelona
ISSN :
1945-7871
Print_ISBN :
978-1-61284-348-3
Electronic_ISBN :
1945-7871
Type :
conf
DOI :
10.1109/ICME.2011.6012102
Filename :
6012102
Link To Document :
بازگشت