DocumentCode
417270
Title
A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs
Author
Tamura, Satoshi ; Iwano, Koji ; Furui, Sadaoki
Author_Institution
Dept. of Comput. Sci., Tokyo Inst. of Technol., Japan
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
For multi-stream HMM that are widely used in audio-visual speech recognition, it is important to automatically and properly adjust stream weights. This paper proposes a stream-weight optimization technique based on a likelihood-ratio maximization criterion. In our audiovisual speech recognition system, video signals are captured and converted into visual features using HMM-based techniques. Extracted acoustic and visual features are concatenated into an audio-visual vector. A multi-stream HMM is obtained from audio and visual HMM. Experiments are conducted using Japanese connected digit speech recorded in real-world environments. Applying the MLLR (maximum likelihood linear regression) adaptation and our optimization method, we achieve a 29% absolute accuracy improvement and a 76% relative error rate reduction compared with the audio-only scheme.
Keywords
error statistics; feature extraction; hidden Markov models; maximum likelihood estimation; optimisation; regression analysis; speech recognition; video signal processing; Japanese connected digit speech; MLLR; acoustic features; audio-visual speech recognition; error rate reduction; likelihood-ratio maximization criterion; maximum likelihood linear regression; multi-stream HMM; stream-weight optimization; video signal capturing; visual features; Acoustic noise; Automatic speech recognition; Computer science; Hidden Markov models; Maximum likelihood estimation; Maximum likelihood linear regression; Optimization methods; Robustness; Speech recognition; Streaming media;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326121
Filename
1326121
Link To Document