Title :
Smoothed N-best-based speaker adaptation for speech recognition
Author :
Matsui, Tomoko ; Matsuoka, Tatsuo ; Furui, Sadaoki
Author_Institution :
NTT Human Interface Labs., Tokyo, Japan
Abstract :
Smoothed estimation and utterance verification are introduced into the N-best-based speaker adaptation method. That method is effective even for speakers whose decodings using speaker-independent (SI) models are error-prone, that is, for speakers for whom adaptation techniques are truly needed. The smoothed estimation improves the performance for such speakers, and the utterance verification reduces the required amount of calculation. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed a reduction of 36.4% in the error rates for speakers whose decodings using SI models are error-prone. To try and find an effective model-transformation for speaker adaptation, we discuss replacing mixture-mean bias estimation by the widely used mixture-mean linear-regression-matrix estimation
Keywords :
decoding; error statistics; hidden Markov models; matrix algebra; smoothing methods; speaker recognition; speech processing; statistical analysis; adaptation techniques; connected digit recognition experiments; continuous mixture density HMM; decoding; error rate reduction; mixture mean linear regression matrix estimation; model transformation; performance evaluation; smoothed N-best-based speaker adaptation; smoothed estimation; speaker independent models; speech recognition; telephone lines; utterance verification; Adaptation model; Decoding; Equations; Error analysis; Hidden Markov models; Humans; Laboratories; Performance evaluation; Speech recognition; Telephony;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
Print_ISBN :
0-8186-7919-0
DOI :
10.1109/ICASSP.1997.596112