Title :
A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs
Author :
Tamura, Satoshi ; Iwano, Koji ; Furui, Sadaoki
Author_Institution :
Dept. of Comput. Sci., Tokyo Inst. of Technol., Japan
Abstract :
For multi-stream HMM that are widely used in audio-visual speech recognition, it is important to automatically and properly adjust stream weights. This paper proposes a stream-weight optimization technique based on a likelihood-ratio maximization criterion. In our audiovisual speech recognition system, video signals are captured and converted into visual features using HMM-based techniques. Extracted acoustic and visual features are concatenated into an audio-visual vector. A multi-stream HMM is obtained from audio and visual HMM. Experiments are conducted using Japanese connected digit speech recorded in real-world environments. Applying the MLLR (maximum likelihood linear regression) adaptation and our optimization method, we achieve a 29% absolute accuracy improvement and a 76% relative error rate reduction compared with the audio-only scheme.
Keywords :
error statistics; feature extraction; hidden Markov models; maximum likelihood estimation; optimisation; regression analysis; speech recognition; video signal processing; Japanese connected digit speech; MLLR; acoustic features; audio-visual speech recognition; error rate reduction; likelihood-ratio maximization criterion; maximum likelihood linear regression; multi-stream HMM; stream-weight optimization; video signal capturing; visual features; Acoustic noise; Automatic speech recognition; Computer science; Hidden Markov models; Maximum likelihood estimation; Maximum likelihood linear regression; Optimization methods; Robustness; Speech recognition; Streaming media;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1326121