مرکز منطقه ای اطلاع رساني علوم و فناوري - Integrating audio and visual information to provide highly robust speech recognition

DocumentCode :

1859877

Title :

Integrating audio and visual information to provide highly robust speech recognition

Author :

Tomlinson, M.J. ; Russell, M.J. ; Brooke, N.M.

Author_Institution :

Speech Res. Unit, DRA, Malvern, UK

Volume :

fYear :

1996

fDate :

7-10 May 1996

Firstpage :

821

Abstract :

There is a requirement in many human machine interactions to provide accurate automatic speech recognition in the presence of high levels of interfering noise. The the paper shows that performance improvements in recognition accuracy can be obtained by including data derived from a speaker´s lip images. We describe the combination of the audio and visual data in the construction of composite feature vectors and a hidden Markov model structure which allows for asynchrony between the audio and visual components. These ideas are applied to a speaker dependent recognition task involving a small vocabulary and subject to interfering noise. The recognition results obtained using composite vectors and cross-product models are compared with those based on an audio-only feature vector. The benefit of this approach is shown to be an increased performance over a very wide range of noise levels

Keywords :

feature extraction; hidden Markov models; image processing; noise; speech recognition; HMM; audio data; audio-only feature vector; audio-visual information integration; automatic speech recognition; composite feature vectors; cross-product models; hidden Markov model; human machine interactions; interfering noise; lip images; noise levels; performance; recognition accuracy; recognition results; robust speech recognition; small vocabulary; speaker dependent recognition; visual data; Automatic speech recognition; Hidden Markov models; Humans; Image recognition; Noise level; Noise robustness; Speech enhancement; Speech recognition; Vocabulary; Working environment noise;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on

Conference_Location :

Atlanta, GA

ISSN :

1520-6149

Print_ISBN :

0-7803-3192-3

Type :

conf

DOI :

10.1109/ICASSP.1996.543247

Filename :

543247

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1859877