DocumentCode
1117972
Title
Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition
Author
Buera, Luis ; Lleida, Eduardo ; Miguel, Antonio ; Ortega, Alfonso ; Saz, Óscar
Author_Institution
Zaragoza Univ.
Volume
15
Issue
3
fYear
2007
fDate
3/1/2007 12:00:00 AM
Firstpage
1098
Lastpage
1113
Abstract
In this paper, a set of feature vector normalization methods based on the minimum mean square error (MMSE) criterion and stereo data is presented. They include multi-environment model-based linear normalization (MEMLIN), polynomial MEMLIN (P-MEMLIN), multi-environment model-based histogram normalization (MEMHIN), and phoneme-dependent MEMLIN (PD-MEMLIN). Those methods model clean and noisy feature vector spaces using Gaussian mixture models (GMMs). The objective of the methods is to learn a transformation between clean and noisy feature vectors associated with each pair of clean and noisy model Gaussians. The direct approach to learn the transformation is by using stereo data; that is, noisy feature vectors and the corresponding clean feature vectors. In this paper, however, a nonstereo data based training procedure, is presented. The transformations can be modeled just like a bias vector (MEMLIN), or by using a first-order polynomial (P-MEMLIN) or a nonlinear function based on histogram equalization (MEMHIN). Further improvements are obtained by using phoneme-dependent bias vector transformation (PD-MEMLIN). In PD-MEMLIN, the clean and noisy feature vector spaces are split into several phonemes, and each of them is modeled as a GMM. Those methods achieve significant word error rate improvements over others that are based on similar targets. The experimental results using the SpeechDat Car database show an average improvement in word error rate greater than 68% in all cases compared to the baseline when using the original clean acoustic models, and up to 83% when training acoustic models on the new normalized feature space
Keywords
Gaussian processes; cepstral analysis; least mean squares methods; polynomials; speech recognition; Gaussian mixture models; SpeechDat Car database; cepstral vector normalization; feature vector normalization method; first-order polynomial; histogram equalization; minimum mean square error criterion; multi-environment model-based histogram normalization; multi-environment model-based linear normalization; noisy feature vector spaces; phoneme-dependent MEMLIN; phoneme-dependent bias vector transformation; polynomial MEMLIN; robust speech recognition; stereo data; Cepstral analysis; Error analysis; Gaussian noise; Histograms; Mean square error methods; Polynomials; Robustness; Spatial databases; Speech recognition; Vectors; Feature vector normalization; Gaussian mixture models (GMMs); minimum mean square error (MMSE); robust speech recognition;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2006.885244
Filename
4100667
Link To Document