Linear Regression Based Acoustic Adaptation for the Subspace Gaussian Mixture Model

Author

Ghalehjegh, Sina Hamidi ; Rose, Richard C.

Author_Institution

Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada

Volume

22

Issue

9

fYear

2014

fDate

Sept. 2014

Firstpage

1391

Lastpage

1402

Abstract

This paper presents a study of two acoustic speaker adaptation techniques applied in the context of the subspace Gaussian mixture model (SGMM) for automatic speech recognition (ASR). First, a model space linear regression based approach is presented for adaptation of SGMM state projection vectors and is referred to as subspace vector adaptation (SVA). Second, an easy to implement realization of constrained maximum likelihood linear regression (CMLLR) is presented for feature space adaptation in the SGMM. Numerically stable procedures for row-by-row estimation of the regression based transformation matrices are presented for both SVA and CMLLR adaptation. These approaches are applied to SGMM models that are estimated using speaker adaptive training (SAT), a technique for estimating more compact speaker independent acoustic models. Unsupervised speaker adaptation performance is evaluated on conversational and read speech task domains and compared to unsupervised adaptation performance obtained using the hidden Markov model-Gaussian mixture model (HMM-GMM) in ASR. It is shown that the feature space and model space adaptation approaches applied to the SGMM provide complementary reductions in word error rate (WER) and provide lower WERs than that obtained using CMLLR adaptation for the HMM-GMM.

Keywords

Gaussian processes; error statistics; hidden Markov models; mixture models; regression analysis; speech recognition; CMLLR; HMM-GMM; SGMM; SVA; WER; acoustic adaptation; acoustic speaker adaptation; automatic speech recognition; constrained maximum likelihood linear regression; feature space adaptation; hidden Markov model-Gaussian mixture model; model space linear regression; speaker adaptive training; subspace Gaussian mixture model; subspace vector adaptation; word error rate; Acoustics; Adaptation models; Covariance matrices; Hidden Markov models; Linear regression; Speech; Vectors; Automatic speech recognition; constrained maximum likelihood linear regression; speaker adaptation; subspace modeling;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2014.2332043

Filename

6840365