• DocumentCode
    591777
  • Title

    Alleviating the small sample-size problem in i-vector based speaker verification

  • Author

    Wei Rao ; Man-Wai Mak

  • Author_Institution
    Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong, China
  • fYear
    2012
  • fDate
    5-8 Dec. 2012
  • Firstpage
    335
  • Lastpage
    339
  • Abstract
    This paper investigates the small sample-size problem in i-vector based speaker verification systems. The idea of i-vectors is to represent the characteristics of speakers in the factors of a factor analyzer. Because the factor loading matrix defines the possible speaker and channel-variability of i-vectors, it is important to suppress the unwanted channel variability. Linear discriminant analysis (LDA), within-class covariance normalization (WCCN), and probabilistic LDA are commonly used for such purpose. These methods, however, require training data comprising many speakers each providing sufficient recording sessions for good performance. Performance will suffer when the number of speakers and/or number of sessions per speaker are too small. This paper compares four approaches to addressing this small sample-size problem: (1) preprocessing the i-vectors by PCA before applying LDA (PCA+LDA), (2) replacing the matrix inverse in LDA by pseudo-inverse, (3) applying multi-way LDA by exploiting the microphone and speaker labels of the training data, and (4) increasing the matrix rank in LDA by generating more i-vectors using utterance partitioning. Results based on NIST 2010 SRE suggests that utterance partitioning performs the best, followed by multi-way LDA and PCA+LDA.
  • Keywords
    covariance matrices; microphones; principal component analysis; speaker recognition; vectors; NIST 2010 SRE; PCA; WCCN; channel-variability; factor analyzer; factor loading matrix; i-vector; linear discriminant analysis; matrix inverse; matrix rank; microphone; multiway LDA; probabilistic LDA; pseudo-inverse; small sample-size problem; speaker label; speaker verification system; unwanted channel variability suppression; utterance partitioning; within-class covariance normalization; Covariance matrix; Microphones; NIST; Principal component analysis; Speech; Training; Vectors; LDA; Speaker verification; i-vectors; multi-way LDA; utterance partitioning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
  • Conference_Location
    Kowloon
  • Print_ISBN
    978-1-4673-2506-6
  • Electronic_ISBN
    978-1-4673-2505-9
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2012.6423527
  • Filename
    6423527