Title :
Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition
Author :
Buera, Luis ; Lleida, Eduardo ; Miguel, Antonio ; Ortega, Alfonso ; Saz, Óscar
Author_Institution :
Zaragoza Univ.
fDate :
3/1/2007 12:00:00 AM
Abstract :
In this paper, a set of feature vector normalization methods based on the minimum mean square error (MMSE) criterion and stereo data is presented. They include multi-environment model-based linear normalization (MEMLIN), polynomial MEMLIN (P-MEMLIN), multi-environment model-based histogram normalization (MEMHIN), and phoneme-dependent MEMLIN (PD-MEMLIN). Those methods model clean and noisy feature vector spaces using Gaussian mixture models (GMMs). The objective of the methods is to learn a transformation between clean and noisy feature vectors associated with each pair of clean and noisy model Gaussians. The direct approach to learn the transformation is by using stereo data; that is, noisy feature vectors and the corresponding clean feature vectors. In this paper, however, a nonstereo data based training procedure, is presented. The transformations can be modeled just like a bias vector (MEMLIN), or by using a first-order polynomial (P-MEMLIN) or a nonlinear function based on histogram equalization (MEMHIN). Further improvements are obtained by using phoneme-dependent bias vector transformation (PD-MEMLIN). In PD-MEMLIN, the clean and noisy feature vector spaces are split into several phonemes, and each of them is modeled as a GMM. Those methods achieve significant word error rate improvements over others that are based on similar targets. The experimental results using the SpeechDat Car database show an average improvement in word error rate greater than 68% in all cases compared to the baseline when using the original clean acoustic models, and up to 83% when training acoustic models on the new normalized feature space
Keywords :
Gaussian processes; cepstral analysis; least mean squares methods; polynomials; speech recognition; Gaussian mixture models; SpeechDat Car database; cepstral vector normalization; feature vector normalization method; first-order polynomial; histogram equalization; minimum mean square error criterion; multi-environment model-based histogram normalization; multi-environment model-based linear normalization; noisy feature vector spaces; phoneme-dependent MEMLIN; phoneme-dependent bias vector transformation; polynomial MEMLIN; robust speech recognition; stereo data; Cepstral analysis; Error analysis; Gaussian noise; Histograms; Mean square error methods; Polynomials; Robustness; Spatial databases; Speech recognition; Vectors; Feature vector normalization; Gaussian mixture models (GMMs); minimum mean square error (MMSE); robust speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2006.885244