• DocumentCode
    423349
  • Title

    Using kernel PCA to improve eigenvoice speaker adaptation

  • Author

    Mak, Brian ; Kwok, James T. ; Ho, Simon

  • Author_Institution
    Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol., China
  • Volume
    5
  • fYear
    2004
  • fDate
    26-29 Aug. 2004
  • Firstpage
    3062
  • Abstract
    Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data is available. Conventionally, these methods employ linear principal component analysis (PCA) to find the most important eigenvoices. Recently, in what we called kernel eigenvoice (KEV) speaker adaptation, we suggested the use of kernel PCA to compute the eigenvoices so as to exploit possible nonlinearity in the data. The major challenge is that unlike the standard eigenvoice (EV) method, an adapted speaker model found by KEV adaptation resides in the high-dimensional kernel-induced feature space; it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions. In an evaluation on the TIDIGITS task using less than 10s of adaptation speech, it is found that KEV speaker adaptation using composite Gaussian kernels outperforms a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1s and 4.1s of speech.
  • Keywords
    Gaussian processes; eigenvalues and eigenfunctions; feature extraction; maximum likelihood estimation; principal component analysis; regression analysis; speaker recognition; MAP adaptation; TIDIGITS task; adapted speaker model; composite Gaussian kernels; decoding; eigenvoice weight estimation; high dimensional kernel induced feature space; kernel PCA; kernel eigenvoice speaker adaptation; linear principal component analysis; maximum likelihood linear regression adaptation; speaker independent model; state observation likelihoods; Computer science; Decoding; Face recognition; Gaussian processes; Independent component analysis; Kernel; Maximum likelihood linear regression; Principal component analysis; Speech analysis; State estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
  • Print_ISBN
    0-7803-8403-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2004.1378558
  • Filename
    1378558