مرکز منطقه ای اطلاع رساني علوم و فناوري - Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition

DocumentCode :

1117972

Title :

Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition

Author :

Buera, Luis ; Lleida, Eduardo ; Miguel, Antonio ; Ortega, Alfonso ; Saz, Óscar

Author_Institution :

Zaragoza Univ.

Volume :

Issue :

fYear :

2007

fDate :

3/1/2007 12:00:00 AM

Firstpage :

1098

Lastpage :

1113

Abstract :

In this paper, a set of feature vector normalization methods based on the minimum mean square error (MMSE) criterion and stereo data is presented. They include multi-environment model-based linear normalization (MEMLIN), polynomial MEMLIN (P-MEMLIN), multi-environment model-based histogram normalization (MEMHIN), and phoneme-dependent MEMLIN (PD-MEMLIN). Those methods model clean and noisy feature vector spaces using Gaussian mixture models (GMMs). The objective of the methods is to learn a transformation between clean and noisy feature vectors associated with each pair of clean and noisy model Gaussians. The direct approach to learn the transformation is by using stereo data; that is, noisy feature vectors and the corresponding clean feature vectors. In this paper, however, a nonstereo data based training procedure, is presented. The transformations can be modeled just like a bias vector (MEMLIN), or by using a first-order polynomial (P-MEMLIN) or a nonlinear function based on histogram equalization (MEMHIN). Further improvements are obtained by using phoneme-dependent bias vector transformation (PD-MEMLIN). In PD-MEMLIN, the clean and noisy feature vector spaces are split into several phonemes, and each of them is modeled as a GMM. Those methods achieve significant word error rate improvements over others that are based on similar targets. The experimental results using the SpeechDat Car database show an average improvement in word error rate greater than 68% in all cases compared to the baseline when using the original clean acoustic models, and up to 83% when training acoustic models on the new normalized feature space

Keywords :

Gaussian processes; cepstral analysis; least mean squares methods; polynomials; speech recognition; Gaussian mixture models; SpeechDat Car database; cepstral vector normalization; feature vector normalization method; first-order polynomial; histogram equalization; minimum mean square error criterion; multi-environment model-based histogram normalization; multi-environment model-based linear normalization; noisy feature vector spaces; phoneme-dependent MEMLIN; phoneme-dependent bias vector transformation; polynomial MEMLIN; robust speech recognition; stereo data; Cepstral analysis; Error analysis; Gaussian noise; Histograms; Mean square error methods; Polynomials; Robustness; Spatial databases; Speech recognition; Vectors; Feature vector normalization; Gaussian mixture models (GMMs); minimum mean square error (MMSE); robust speech recognition;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2006.885244

Filename :

4100667

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1117972