DocumentCode
62390
Title
Linear Regression Based Acoustic Adaptation for the Subspace Gaussian Mixture Model
Author
Ghalehjegh, Sina Hamidi ; Rose, Richard C.
Author_Institution
Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada
Volume
22
Issue
9
fYear
2014
fDate
Sept. 2014
Firstpage
1391
Lastpage
1402
Abstract
This paper presents a study of two acoustic speaker adaptation techniques applied in the context of the subspace Gaussian mixture model (SGMM) for automatic speech recognition (ASR). First, a model space linear regression based approach is presented for adaptation of SGMM state projection vectors and is referred to as subspace vector adaptation (SVA). Second, an easy to implement realization of constrained maximum likelihood linear regression (CMLLR) is presented for feature space adaptation in the SGMM. Numerically stable procedures for row-by-row estimation of the regression based transformation matrices are presented for both SVA and CMLLR adaptation. These approaches are applied to SGMM models that are estimated using speaker adaptive training (SAT), a technique for estimating more compact speaker independent acoustic models. Unsupervised speaker adaptation performance is evaluated on conversational and read speech task domains and compared to unsupervised adaptation performance obtained using the hidden Markov model-Gaussian mixture model (HMM-GMM) in ASR. It is shown that the feature space and model space adaptation approaches applied to the SGMM provide complementary reductions in word error rate (WER) and provide lower WERs than that obtained using CMLLR adaptation for the HMM-GMM.
Keywords
Gaussian processes; error statistics; hidden Markov models; mixture models; regression analysis; speech recognition; CMLLR; HMM-GMM; SGMM; SVA; WER; acoustic adaptation; acoustic speaker adaptation; automatic speech recognition; constrained maximum likelihood linear regression; feature space adaptation; hidden Markov model-Gaussian mixture model; model space linear regression; speaker adaptive training; subspace Gaussian mixture model; subspace vector adaptation; word error rate; Acoustics; Adaptation models; Covariance matrices; Hidden Markov models; Linear regression; Speech; Vectors; Automatic speech recognition; constrained maximum likelihood linear regression; speaker adaptation; subspace modeling;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2014.2332043
Filename
6840365
Link To Document