DocumentCode :
3348327
Title :
Audio-visual graphical models for speech processing
Author :
Hershey, John ; Attias, Hagai ; Jojic, Nebojsa ; Kristjansson, Trausti
Author_Institution :
Machine Perception Lab., Univ. of California, San Diego, CA, USA
Volume :
5
fYear :
2004
fDate :
17-21 May 2004
Abstract :
Perceiving sounds in a noisy environment is a challenging problem. Visual lip-reading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audio-visual systems have been assembled from individually engineered modules. We propose to fuse audio and video in a probabilistic generative model that implements cross-model self-supervised learning, enabling adaptation to audio-visual data. The video model features a Gaussian mixture model embedded in a linear subspace of a sprite which translates in the video. The system can learn to detect and enhance speech in noise given only a short (30 second) sequence of audio-visual data. We show some results for speech detection and enhancement, and discuss extensions to the model that are under investigation.
Keywords :
Gaussian processes; adaptive signal processing; feature extraction; inference mechanisms; learning (artificial intelligence); speech processing; video signal processing; Gaussian mixture video model; audio-visual data adaptation; audio-visual graphical models; audio-visual speech phonetic content; cross-model self-supervised learning; feature extraction; inference rules; learning rules; probabilistic generative model; speech detection; speech enhancement; speech processing; sprite linear subspace; video tracking; visual lip-reading; Acoustic noise; Acoustical engineering; Assembly systems; Audio-visual systems; Fuses; Graphical models; Lips; Speech enhancement; Speech processing; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1327194
Filename :
1327194
Link To Document :
بازگشت