DocumentCode
394253
Title
Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation
Author
Zhou, Bowen ; Hansen, John H. L.
Author_Institution
Robust Speech Process. Group, Colorado Univ., Boulder, CO, USA
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a fast speaker adaptation algorithm entitled Eigenspace Mapping (EigMap) is proposed and described. EigMap rapidly adapts the speaker independent models by constructing discriminative acoustic models in the test speaker´s eigenspace. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation technique such as block diagonal MLLR. A relative improvement of 18.4% over baseline recognizer is achieved using EigMap with only about 4.5 seconds of adaptation data. It is also demonstrated that EigMap is additive to MLLR by encompassing the speaker dependent discrimination information. A significant relative improvement of 24.6% over baseline is observed by combining MLLR and EigMap techniques.
Keywords
acoustic signal processing; covariance matrices; eigenvalues and eigenfunctions; speaker recognition; EigMap; MLLR; acoustic environment; baseline models; baseline recognizer; correlation; discriminative acoustic model; discriminative acoustic models; eigendirections; eigenspace mapping; fast speaker adaptation algorithm; rapid speaker adaptation; speaker dependent discrimination; speaker environment; speaker independent models; time-invariant characteristics; unsupervised adaptation experiments; utterance covariance matrix; Acoustic testing; Linear discriminant analysis; Loudspeakers; Maximum likelihood decoding; Maximum likelihood linear regression; Natural languages; Robustness; Speech processing; Speech recognition; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198779
Filename
1198779
Link To Document