• DocumentCode
    742122
  • Title

    Acoustic Factor Analysis for Robust Speaker Verification

  • Author

    Hasan, T. ; Hansen, John H. L.

  • Author_Institution
    Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
  • Volume
    21
  • Issue
    4
  • fYear
    2013
  • fDate
    4/1/2013 12:00:00 AM
  • Firstpage
    842
  • Lastpage
    853
  • Abstract
    Factor analysis based channel mismatch compensation methods for speaker recognition are based on the assumption that speaker/utterance dependent Gaussian Mixture Model (GMM) mean super-vectors can be constrained to reside in a lower dimensional subspace. This approach does not consider the fact that conventional acoustic feature vectors also reside in a lower dimensional manifold of the feature space, when feature covariance matrices contain close to zero eigenvalues. In this study, based on observations of the covariance structure of acoustic features, we propose a factor analysis modeling scheme in the acoustic feature space instead of the super-vector space and derive a mixture dependent feature transformation. We demonstrate how this single linear transformation performs feature dimensionality reduction, de-correlation, normalization and enhancement, at once. The proposed transformation is shown to be closely related to signal subspace based speech enhancement schemes. In contrast to traditional front-end mixture dependent feature transformations, where feature alignment is performed using the highest scoring mixture, the proposed transformation is integrated within the speaker recognition system using a probabilistic feature alignment technique, which nullifies the need for regenerating the features/retraining the Universal Background Model (UBM). Incorporating the proposed method with a state-of-the-art i-vector and Gaussian Probabilistic Linear Discriminant Analysis (PLDA) framework, we perform evaluations on National Institute of Science and Technology (NIST) Speaker Recognition Evaluation (SRE) 2010 core telephone and microphone tasks. The experimental results demonstrate the superiority of the proposed scheme compared to both full-covariance and diagonal covariance UBM based systems. Simple equal-weight fusion of baseline and proposed systems also yield significant performance gains.
  • Keywords
    acoustic correlation; compensation; covariance matrices; eigenvalues and eigenfunctions; feature extraction; principal component analysis; probability; speaker recognition; speech enhancement; GMM; Gaussian mixture model; PLDA; UBM; acoustic factor analysis; acoustic feature space; acoustic feature vector; channel mismatch compensation; close to zero eigenvalue; covariance structure; decorrelation analysis; feature covariance matrix; feature dimensionality reduction; feature transformation; front-end mixture; i-vector; linear transformation; microphone; mixture dependent feature transformation; normalization; probabilistic feature alignment; probabilistic linear discriminant analysis; robust speaker verification; signal subspace; speech enhancement scheme; super vector space; telephone; universal background model; Acoustics; Covariance matrix; Eigenvalues and eigenfunctions; Feature extraction; Gain; Speaker recognition; Vectors; Acoustic feature enhancement; factor analysis; probabilistic principal component analysis; speaker verification;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2226161
  • Filename
    6338275