Using kernel PCA to improve eigenvoice speaker adaptation

Author

Mak, Brian ; Kwok, James T. ; Ho, Simon

Author_Institution

Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., China

Volume

5

fYear

2004

fDate

26-29 Aug. 2004

Firstpage

3062

Abstract

Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data is available. Conventionally, these methods employ linear principal component analysis (PCA) to find the most important eigenvoices. Recently, in what we called kernel eigenvoice (KEV) speaker adaptation, we suggested the use of kernel PCA to compute the eigenvoices so as to exploit possible nonlinearity in the data. The major challenge is that unlike the standard eigenvoice (EV) method, an adapted speaker model found by KEV adaptation resides in the high-dimensional kernel-induced feature space; it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions. In an evaluation on the TIDIGITS task using less than 10s of adaptation speech, it is found that KEV speaker adaptation using composite Gaussian kernels outperforms a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1s and 4.1s of speech.

Keywords

Gaussian processes; eigenvalues and eigenfunctions; feature extraction; maximum likelihood estimation; principal component analysis; regression analysis; speaker recognition; MAP adaptation; TIDIGITS task; adapted speaker model; composite Gaussian kernels; decoding; eigenvoice weight estimation; high dimensional kernel induced feature space; kernel PCA; kernel eigenvoice speaker adaptation; linear principal component analysis; maximum likelihood linear regression adaptation; speaker independent model; state observation likelihoods; Computer science; Decoding; Face recognition; Gaussian processes; Independent component analysis; Kernel; Maximum likelihood linear regression; Principal component analysis; Speech analysis; State estimation;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on

Print_ISBN

0-7803-8403-2

Type

conf

DOI

10.1109/ICMLC.2004.1378558

Filename

1378558