مرکز منطقه ای اطلاع رساني علوم و فناوري - On Dynamic Stream Weighting for Audio-Visual Speech Recognition

DocumentCode :

1351820

Title :

On Dynamic Stream Weighting for Audio-Visual Speech Recognition

Author :

Estellers, Virginia ; Gurban, Mihai ; Thiran, Jean-Philippe

Author_Institution :

Signal Process. Lab. LTS5, Ecole Polytech. Fed. de Lausanne (EPFL), Ecublens, Switzerland

Volume :

Issue :

fYear :

2012

fDate :

5/1/2012 12:00:00 AM

Firstpage :

1145

Lastpage :

1157

Abstract :

The integration of audio and visual information improves speech recognition performance, specially in the presence of noise. In these circumstances it is necessary to introduce audio and visual weights to control the contribution of each modality to the recognition task. We present a method to set the value of the weights associated to each stream according to their reliability for speech recognition, allowing them to change with time and adapt to different noise and working conditions. Our dynamic weights are derived from several measures of the stream reliability, some specific to speech processing and others inherent to any classification task, and take into account the special role of silence detection in the definition of audio and visual weights. In this paper, we propose a new confidence measure, compare it to existing ones, and point out the importance of the correct detection of silence utterances in the definition of the weighting system. Experimental results support our main contribution: the inclusion of a voice activity detector in the weighting scheme improves speech recognition over different system architectures and confidence measures, leading to an increase in performance more relevant than any difference between the proposed confidence measures.

Keywords :

audio signal processing; audio streaming; audio-visual systems; signal classification; speech recognition; audio information; audio-visual speech recognition; classification task; dynamic stream weighting scheme; silence detection; silence utterance; speech processing; visual information; voice activity detector; Hidden Markov models; Reliability; Speech; Speech processing; Speech recognition; Visualization; Weight measurement; Adaptive weighting; audio-visual speech recognition; multi-modal classification; multi-stream hidden Markov model (HMM); robust speech recognition; stream reliability; voice activity detection (VAD);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2172427

Filename :

6047566

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1351820