Title :
Speaker Verification With Feature-Space MAPLR Parameters
Author :
Zhu, Donglai ; Ma, Bin ; Li, Haizhou
Author_Institution :
Dept. of Human Language Technol., Inst. for Infocomm Res., Singapore, Singapore
fDate :
3/1/2011 12:00:00 AM
Abstract :
This paper studies a new technique that characterizes a speaker by the difference between the speaker and a cohort of background speakers in the form of feature-space maximum a posteriori linear regression (fMAPLR). The fMAPLR is a linear regression function that projects speaker dependent features to speaker independent ones, also known as an affine transform. It consists of two sets of parameters, bias vectors and transform matrices. The former, representing the first order information, is more robust than the latter, the second-order information. We propose a flexible tying scheme that allows the bias vectors and the matrices to be associated with different regression classes, such that both parameters are given sufficient statistics in a speaker verification task. We formulate a maximum a posteriori (MAP) algorithm for the estimation of feature transform parameters, that further alleviates the possible numerical problem. The fMAPLR parameters are then vectorized and compared via a support vector machine (SVM). We conduct the experiments on National Institute of Standards and Technology (NIST) 2006 and 2008 Speaker Recognition Evaluation databases. The experiments show that the proposed technique consistently outperforms the baseline Gaussian mixture model (GMM)-SVM speaker verification system.
Keywords :
matrix algebra; maximum likelihood estimation; regression analysis; speaker recognition; support vector machines; vectors; NIST 2006 Speaker Recognition Evaluation database; NIST 2008 Speaker Recognition Evaluation database; afflne transform; bias vectors; feature-space MAPLR parameters; feature-space maximum a posteriori linear regression; speaker verification; support vector machine; transform matrices; tying scheme; Cepstral analysis; Electrical capacitance tomography; Linear regression; Maximum likelihood linear regression; NIST; Robustness; Spatial databases; Speaker recognition; Statistics; Support vector machines; Feature transform; maximum a posteriori; speaker recognition; support vector machine (SVM);
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2010.2051269