Title :
Using kernel PCA to improve eigenvoice speaker adaptation
Author :
Mak, Brian ; Kwok, James T. ; Ho, Simon
Author_Institution :
Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., China
Abstract :
Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data is available. Conventionally, these methods employ linear principal component analysis (PCA) to find the most important eigenvoices. Recently, in what we called kernel eigenvoice (KEV) speaker adaptation, we suggested the use of kernel PCA to compute the eigenvoices so as to exploit possible nonlinearity in the data. The major challenge is that unlike the standard eigenvoice (EV) method, an adapted speaker model found by KEV adaptation resides in the high-dimensional kernel-induced feature space; it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions. In an evaluation on the TIDIGITS task using less than 10s of adaptation speech, it is found that KEV speaker adaptation using composite Gaussian kernels outperforms a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1s and 4.1s of speech.
Keywords :
Gaussian processes; eigenvalues and eigenfunctions; feature extraction; maximum likelihood estimation; principal component analysis; regression analysis; speaker recognition; MAP adaptation; TIDIGITS task; adapted speaker model; composite Gaussian kernels; decoding; eigenvoice weight estimation; high dimensional kernel induced feature space; kernel PCA; kernel eigenvoice speaker adaptation; linear principal component analysis; maximum likelihood linear regression adaptation; speaker independent model; state observation likelihoods; Computer science; Decoding; Face recognition; Gaussian processes; Independent component analysis; Kernel; Maximum likelihood linear regression; Principal component analysis; Speech analysis; State estimation;
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
DOI :
10.1109/ICMLC.2004.1378558