Acoustic Factor Analysis for Robust Speaker Verification

Author

Hasan, T. ; Hansen, John H. L.

Author_Institution

Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA

Volume

Issue

fYear

2013

fDate

4/1/2013 12:00:00 AM

Firstpage

842

Lastpage

853

Abstract

Factor analysis based channel mismatch compensation methods for speaker recognition are based on the assumption that speaker/utterance dependent Gaussian Mixture Model (GMM) mean super-vectors can be constrained to reside in a lower dimensional subspace. This approach does not consider the fact that conventional acoustic feature vectors also reside in a lower dimensional manifold of the feature space, when feature covariance matrices contain close to zero eigenvalues. In this study, based on observations of the covariance structure of acoustic features, we propose a factor analysis modeling scheme in the acoustic feature space instead of the super-vector space and derive a mixture dependent feature transformation. We demonstrate how this single linear transformation performs feature dimensionality reduction, de-correlation, normalization and enhancement, at once. The proposed transformation is shown to be closely related to signal subspace based speech enhancement schemes. In contrast to traditional front-end mixture dependent feature transformations, where feature alignment is performed using the highest scoring mixture, the proposed transformation is integrated within the speaker recognition system using a probabilistic feature alignment technique, which nullifies the need for regenerating the features/retraining the Universal Background Model (UBM). Incorporating the proposed method with a state-of-the-art i-vector and Gaussian Probabilistic Linear Discriminant Analysis (PLDA) framework, we perform evaluations on National Institute of Science and Technology (NIST) Speaker Recognition Evaluation (SRE) 2010 core telephone and microphone tasks. The experimental results demonstrate the superiority of the proposed scheme compared to both full-covariance and diagonal covariance UBM based systems. Simple equal-weight fusion of baseline and proposed systems also yield significant performance gains.

Keywords

acoustic correlation; compensation; covariance matrices; eigenvalues and eigenfunctions; feature extraction; principal component analysis; probability; speaker recognition; speech enhancement; GMM; Gaussian mixture model; PLDA; UBM; acoustic factor analysis; acoustic feature space; acoustic feature vector; channel mismatch compensation; close to zero eigenvalue; covariance structure; decorrelation analysis; feature covariance matrix; feature dimensionality reduction; feature transformation; front-end mixture; i-vector; linear transformation; microphone; mixture dependent feature transformation; normalization; probabilistic feature alignment; probabilistic linear discriminant analysis; robust speaker verification; signal subspace; speech enhancement scheme; super vector space; telephone; universal background model; Acoustics; Covariance matrix; Eigenvalues and eigenfunctions; Feature extraction; Gain; Speaker recognition; Vectors; Acoustic feature enhancement; factor analysis; probabilistic principal component analysis; speaker verification;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2012.2226161

Filename

6338275

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=742122