LipActs: Efficient representations for visual speakers

Author

Zavesky, Eric

Author_Institution

AT&T Labs. Res., Middletown, NJ, USA

fYear

2011

fDate

11-15 July 2011

Firstpage

1

Lastpage

4

Abstract

Video-based lip activity analysis has been successfully used for assisting speech recognition for almost a decade. Surprisingly, this same capability has not been heavily used for near real-time visual speaker retrieval and verification, due to tracking complexity, inadequate or difficult feature determination, and the need for a large amount of pre-labeled data for model training. This paper explores the performance of several solutions using modern histogram of oriented gradients (HOG) features, several quantization techniques, and analyzes the benefits of temporal sampling and spatial partitioning to derive a representation called LipActs. Two datasets are used for evaluation: one with 81 participants derived from varying quality YouTube content and one with 3 participants derived from a forward facing mobile video camera with 10 varied lighting and capture angle environments. Over these datasets, LipActs with a moderate number of pooled temporal frames and multi-resolution spatial quantization, offer an improvement of 37-73% over raw features when optimizing for lowest equal error rate (EER).

Keywords

computational complexity; speech recognition; video signal processing; EER; HOG; LipActs; YouTube content; equal error rate; histogram of oriented gradients; mobile video camera; quantization techniques; spatial partitioning; speech recognition; temporal sampling; tracking complexity; video based lip activity analysis; visual speaker retrieval; Detectors; Face; Feature extraction; Histograms; Quantization; Visualization; Vocabulary; feature extraction; learning systems; verification; video analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia and Expo (ICME), 2011 IEEE International Conference on

Conference_Location

Barcelona

ISSN

1945-7871

Print_ISBN

978-1-61284-348-3

Electronic_ISBN

1945-7871

Type

conf

DOI

10.1109/ICME.2011.6012102

Filename

6012102