DocumentCode
1445832
Title
A structural Bayes approach to speaker adaptation
Author
Shinoda, Koichi ; Lee, Chin-Hui
Author_Institution
Dept. of Comput. & Commun. Media Res., NEC Corp., Kawasaki, Japan
Volume
9
Issue
3
fYear
2001
fDate
3/1/2001 12:00:00 AM
Firstpage
276
Lastpage
287
Abstract
Maximum a posteriori (MAP) estimation has been successfully applied to speaker adaptation in speech recognition systems using hidden Markov models. When the amount of data is sufficiently large, MAP estimation yields recognition performance as good as that obtained using maximum-likelihood (ML) estimation. This paper describes a structural maximum a posteriori (SMAP) approach to improve the MAP estimates obtained when the amount of adaptation data is small. A hierarchical structure in the model parameter space is assumed and the probability density functions for model parameters at one level are used as priors for those of the parameters at adjacent levels. Results of supervised adaptation experiments using nonnative speakers´ utterances showed that SMAP estimation reduced error rates by 61% when ten utterances were used for adaptation and that it yielded the same accuracy as MAP and ML estimation when the amount of data was sufficiently large. Furthermore, the recognition results obtained in unsupervised adaptation experiments showed that SMAP estimation was effective even when only one utterance from a new speaker was used for adaptation. An effective way to combine rapid supervised adaptation and on-line unsupervised adaptation was also investigated
Keywords
Bayes methods; hidden Markov models; maximum likelihood estimation; speech recognition; MAP estimation; SMAP approach; error rates; hidden Markov models; hierarchical structure; maximum a posteriori estimation; model parameter space; nonnative speaker; probability density functions; speaker adaptation; speech recognition systems; structural Bayes approach; structural maximum a posteriori approach; supervised adaptation experiments; unsupervised adaptation experiments; utterances; Adaptation model; Degradation; Estimation error; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Parameter estimation; Probability density function; Speech recognition; Yield estimation;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/89.906001
Filename
906001
Link To Document