Title :
A structural Bayes approach to speaker adaptation
Author :
Shinoda, Koichi ; Lee, Chin-Hui
Author_Institution :
Dept. of Comput. & Commun. Media Res., NEC Corp., Kawasaki, Japan
fDate :
3/1/2001 12:00:00 AM
Abstract :
Maximum a posteriori (MAP) estimation has been successfully applied to speaker adaptation in speech recognition systems using hidden Markov models. When the amount of data is sufficiently large, MAP estimation yields recognition performance as good as that obtained using maximum-likelihood (ML) estimation. This paper describes a structural maximum a posteriori (SMAP) approach to improve the MAP estimates obtained when the amount of adaptation data is small. A hierarchical structure in the model parameter space is assumed and the probability density functions for model parameters at one level are used as priors for those of the parameters at adjacent levels. Results of supervised adaptation experiments using nonnative speakers´ utterances showed that SMAP estimation reduced error rates by 61% when ten utterances were used for adaptation and that it yielded the same accuracy as MAP and ML estimation when the amount of data was sufficiently large. Furthermore, the recognition results obtained in unsupervised adaptation experiments showed that SMAP estimation was effective even when only one utterance from a new speaker was used for adaptation. An effective way to combine rapid supervised adaptation and on-line unsupervised adaptation was also investigated
Keywords :
Bayes methods; hidden Markov models; maximum likelihood estimation; speech recognition; MAP estimation; SMAP approach; error rates; hidden Markov models; hierarchical structure; maximum a posteriori estimation; model parameter space; nonnative speaker; probability density functions; speaker adaptation; speech recognition systems; structural Bayes approach; structural maximum a posteriori approach; supervised adaptation experiments; unsupervised adaptation experiments; utterances; Adaptation model; Degradation; Estimation error; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Parameter estimation; Probability density function; Speech recognition; Yield estimation;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on