• DocumentCode
    1295089
  • Title

    A Markov random field approach to Bayesian speaker adaptation

  • Author

    Shahshahani, Ben M.

  • Author_Institution
    Speech Bus. Unit, IBM Corp., Boca Raton, FL, USA
  • Volume
    5
  • Issue
    2
  • fYear
    1997
  • fDate
    3/1/1997 12:00:00 AM
  • Firstpage
    183
  • Lastpage
    191
  • Abstract
    Speaker adaptation through Bayesian learning methodology is studied in this paper. In order to utilize the cross allophone correlations, a Markov random field (MRF) model is proposed as the joint prior distribution of the mean vectors of the allophones. Neighborhoods are defined as pairs of parameters between which strong correlations have been observed previously. Maximum a posteriori estimates of the mean vectors are obtained through an iterative optimization technique that converges to the global maximum of the posterior distribution. This process is similar to a recursive prediction of the parameters, where at each iteration each parameter is estimated by a weighted sum of two terms, the first predicted by the neighbors and the second by the samples. Further Bayesian smoothing of the output distributions is carried out by utilizing some simplifications on the functional forms of the marginal posterior distributions. The proposed method is fast, consuming only a few CPU minutes for processing hundreds of sentences from a new speaker on an IBM RS6000 Model 580 system. Experimental results show rapid improvement of recognition accuracy
  • Keywords
    Bayes methods; Markov processes; convergence of numerical methods; correlation methods; iterative methods; learning (artificial intelligence); maximum likelihood estimation; optimisation; random processes; smoothing methods; speech recognition; Bayesian learning; Bayesian smoothing; Bayesian speaker adaptation; IBM RS6000 Model 580 system; Markov random field approach; cross allophone correlations; functional forms; global maximum; iterative optimization technique; joint prior distribution; maximum a posteriori estimates; neighborhoods; output distributions; posterior distribution; recursive prediction; weighted sum; Bayesian methods; Data mining; Markov random fields; Maximum a posteriori estimation; Parameter estimation; Recursive estimation; Smoothing methods; Speech recognition; Training data; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.554780
  • Filename
    554780