Discriminative acoustic model using eigenspace mapping for rapid speaker adaptation

Author

Zhou, Bowen ; Hansen, John H. L.

Author_Institution

Robust Speech Process. Group, Colorado Univ., Boulder, CO, USA

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a fast speaker adaptation algorithm entitled Eigenspace Mapping (EigMap) is proposed and described. EigMap rapidly adapts the speaker independent models by constructing discriminative acoustic models in the test speaker´s eigenspace. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation technique such as block diagonal MLLR. A relative improvement of 18.4% over baseline recognizer is achieved using EigMap with only about 4.5 seconds of adaptation data. It is also demonstrated that EigMap is additive to MLLR by encompassing the speaker dependent discrimination information. A significant relative improvement of 24.6% over baseline is observed by combining MLLR and EigMap techniques.

Keywords

acoustic signal processing; covariance matrices; eigenvalues and eigenfunctions; speaker recognition; EigMap; MLLR; acoustic environment; baseline models; baseline recognizer; correlation; discriminative acoustic model; discriminative acoustic models; eigendirections; eigenspace mapping; fast speaker adaptation algorithm; rapid speaker adaptation; speaker dependent discrimination; speaker environment; speaker independent models; time-invariant characteristics; unsupervised adaptation experiments; utterance covariance matrix; Acoustic testing; Linear discriminant analysis; Loudspeakers; Maximum likelihood decoding; Maximum likelihood linear regression; Natural languages; Robustness; Speech processing; Speech recognition; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198779

Filename

1198779