• DocumentCode
    1497096
  • Title

    Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition

  • Author

    Ferras, Marc ; Leung, Cheung-Chi ; Barras, Claude ; Gauvain, Jean-Luc

  • Author_Institution
    LIMSI-CNRS, Orsay, France
  • Volume
    18
  • Issue
    6
  • fYear
    2010
  • Firstpage
    1366
  • Lastpage
    1378
  • Abstract
    In the last years the speaker recognition field has made extensive use of speaker adaptation techniques. Adaptation allows speaker model parameters to be estimated using less speech data than needed for maximum-likelihood (ML) training. The maximum a posteriori (MAP) and maximum-likelihood linear regression (MLLR) techniques have typically been used for adaptation. Recently, MAP and MLLR adaptation have been incorporated in the feature extraction stage of support vector machine (SVM)-based speaker recognition systems. Two approaches to feature extraction use a SVM to classify either the MAP-adapted Gaussian mean vector parameters (GSV-SVM) or the MLLR transform coefficients (MLLR-SVM). In this paper, we provide an experimental analysis of the GSV-SVM and MLLR-SVM approaches. We largely focus on the latter by exploring constrained and unconstrained transforms and different choices of the acoustic model. A channel-compensated front-end is used to prevent the MLLR transforms to adapt to channel components in the speech data. Additional acoustic models were trained using speaker adaptive training (SAT) to better estimate the speaker MLLR transforms. We provide results on the NIST 2005 and 2006 Speaker Recognition Evaluation (SRE) data and fusion results on the SRE 2006 data. The results show that using the compensated front-end, SAT models and multiple regression classes bring major performance improvements.
  • Keywords
    feature extraction; maximum likelihood estimation; regression analysis; speaker recognition; Gaussian mean vector parameters; NIST; SVM-based speaker recognition; channel-compensated front-end; feature extraction; maximum a posteriori; maximum-likelihood linear regression techniques; maximum-likelihood training; speaker adaptation methods; speaker adaptation techniques; speaker adaptive training; speaker recognition evaluation; Constrained MLLR (CMLLR); Gaussian mixture model (GMM); Gaussian supervectors; maximum-likelihood linear regression (MLLR); speaker recognition; support vector machine (SVM);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2009.2034187
  • Filename
    5282585