مرکز منطقه ای اطلاع رساني علوم و فناوري - Audio-visual speech recognition in noisy audio environments

DocumentCode :

3251407

Title :

Audio-visual speech recognition in noisy audio environments

Author :

Palecek, Karel ; Chaloupka, J.

Author_Institution :

Inst. of Inf. Technol. & Electron., Tech. Univ. of Liberec, Liberec, Czech Republic

fYear :

2013

fDate :

2-4 July 2013

Firstpage :

484

Lastpage :

487

Abstract :

It is a well-known fact that the visual part of speech can improve the resulting recognition rate mainly in noisy conditions. Main goal of this work is to find a set of visual features which would be possible to use in our audio-visual speech recognition systems. Discrete Cosine Transform (DCT) and Active Appearance Model (AAM) based visual features are extracted from visual speech signals, enhanced by a simplified variant of Hierarchical Linear Discriminant Analysis (HiLDA) and normalized across speakers. The visual features are then combined with standard MFCC audio features by the middle fusion method. The results from audio-visual speech recognition are compared with the results from experiments where the log-spectra minimum mean square error and multiband spectral subtraction methods for reducing additive noise in the audio signal are used.

Keywords :

audio-visual systems; discrete cosine transforms; feature extraction; image fusion; image recognition; mean square error methods; speech enhancement; speech recognition; AAM; DCT; HiLDA; MFCC audio features; active appearance model; additive noise; audio-visual speech recognition; discrete cosine transform; hierarchical linear discriminant analysis; log-spectra minimum mean square error; middle fusion method; multiband spectral subtraction; noisy audio environments; visual feature extraction; visual features; visual speech signals; Active appearance model; Discrete cosine transforms; Feature extraction; Shape; Speech; Speech recognition; Visualization; AAM; HiLDA; audio-visual speech recognition; speech enhancement; visual speech feature extraction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Telecommunications and Signal Processing (TSP), 2013 36th International Conference on

Conference_Location :

Rome

Print_ISBN :

978-1-4799-0402-0

Type :

conf

DOI :

10.1109/TSP.2013.6613979

Filename :

6613979

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3251407