DocumentCode
394733
Title
Audio-visual speaker recognition using time-varying stream reliability prediction
Author
Chaudhari, Upendra V. ; Ramaswamy, Ganesh N. ; Potamianos, Gerasimos ; Neti, Chalapathy
Author_Institution
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume
5
fYear
2003
fDate
6-10 April 2003
Abstract
We examine a time-varying, context dependent, information fusion methodology for multi-stream authentication based on audio and video data collected simultaneously during a user´s interaction with a system. Scores obtained from the two data streams are combined based on the relative local richness, as compared to the training data or derived model, and on the stability of each stream. The results show that the proposed technique outperforms the use of video or audio data alone as well as the use of fused data streams (via concatenation). Of particular note is that the performance improvements are achieved for clean, high quality speech, whereas previous efforts focused on degraded speech conditions.
Keywords
audio signal processing; audio user interfaces; audio-visual systems; biometrics (access control); gesture recognition; speaker recognition; speech-based user interfaces; video signal processing; audio data; audio-visual speaker recognition; context dependent methodology; fused data streams; information fusion methodology; multi-stream authentication; relative local richness; reliability prediction; time-varying stream; training data; user interaction; video data; Authentication; Degradation; Robustness; Speaker recognition; Speech recognition; Stability; Statistics; Streaming media; Time varying systems; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1200070
Filename
1200070
Link To Document