Title :
Improving Rapid Unsupervised Speaker Adaptation Based On Hmm Sufficient Statistics
Author :
Gomez, Randy ; Toda, Tomoki ; Saruwatari, Hiroshi ; Shikano, Kiyohiro
Author_Institution :
Graduate Sch. of Inf. Sci., Nara Inst. of Sci. & Technol.
Abstract :
In real-time speech recognition applications, there is a need to implement a fast and reliable adaptation algorithm. We propose a method to reduce adaptation time of the unsupervised speaker adaptation based on HMM-sufficient statistics. We use only a single arbitrary utterance without transcriptions in selecting the N-best speakers´ sufficient statistics created offline to provide data for adaptation to a target speaker. Further reduction of N-best implies a reduction in adaptation time. However, it degrades recognition performance due to insufficiency of data needed to robustly adapt the model. Linear interpolation of the global HMM-sufficient statistics offsets this negative effect and achieves a 50% reduction in adaptation time without compromising the recognition performance. We have reduced the adaptation time from 10 sec to 5 sec without degradation of the word accuracy. Furthermore, we compared our method with vocal tract length normalization (VTLN), maximum a posteriori (MAP) and maximum likelihood linear regression (MLLR). Moreover, we tested in office, car, crowd and booth noise environments in 10 dB, 15 dB, 20 dB and 25 dB SNRs
Keywords :
hidden Markov models; interpolation; speech recognition; HMM sufficient statistics; linear interpolation; real-time speech recognition; single arbitrary utterance; unsupervised speaker adaptation; Adaptation model; Databases; Degradation; Hidden Markov models; Information science; Interpolation; Loudspeakers; Maximum likelihood linear regression; Statistics; Training data;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660192