A Markov random field approach to Bayesian speaker adaptation

Author

Shahshahani, Ben M.

Author_Institution

Speech Bus. Unit, IBM Corp., Boca Raton, FL, USA

Volume

5

Issue

2

fYear

1997

fDate

3/1/1997 12:00:00 AM

Firstpage

183

Lastpage

191

Abstract

Speaker adaptation through Bayesian learning methodology is studied in this paper. In order to utilize the cross allophone correlations, a Markov random field (MRF) model is proposed as the joint prior distribution of the mean vectors of the allophones. Neighborhoods are defined as pairs of parameters between which strong correlations have been observed previously. Maximum a posteriori estimates of the mean vectors are obtained through an iterative optimization technique that converges to the global maximum of the posterior distribution. This process is similar to a recursive prediction of the parameters, where at each iteration each parameter is estimated by a weighted sum of two terms, the first predicted by the neighbors and the second by the samples. Further Bayesian smoothing of the output distributions is carried out by utilizing some simplifications on the functional forms of the marginal posterior distributions. The proposed method is fast, consuming only a few CPU minutes for processing hundreds of sentences from a new speaker on an IBM RS6000 Model 580 system. Experimental results show rapid improvement of recognition accuracy

Keywords

Bayes methods; Markov processes; convergence of numerical methods; correlation methods; iterative methods; learning (artificial intelligence); maximum likelihood estimation; optimisation; random processes; smoothing methods; speech recognition; Bayesian learning; Bayesian smoothing; Bayesian speaker adaptation; IBM RS6000 Model 580 system; Markov random field approach; cross allophone correlations; functional forms; global maximum; iterative optimization technique; joint prior distribution; maximum a posteriori estimates; neighborhoods; output distributions; posterior distribution; recursive prediction; weighted sum; Bayesian methods; Data mining; Markov random fields; Maximum a posteriori estimation; Parameter estimation; Recursive estimation; Smoothing methods; Speech recognition; Training data; Vocabulary;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.554780

Filename

554780