Title :
Comparison of discriminative training methods for speaker verification
Author :
Ma, Chengpan ; Chang, Eric
Author_Institution :
Microsoft Res. Asia, China
Abstract :
The maximum likelihood estimation (MLE) and Bayesian maximum a-posteriori (MAP) adaptation methods for Gaussian mixture models (GMM) have proven to be effective and efficient for speaker verification, even though each speaker model is trained using only his own training utterances. Discriminative criteria aim at increasing discriminability by using out-of-class data. In this paper, we consider the speaker verification task using three discriminative training methods to compare performance. Comparisons are discussed for the maximum mutual information (MMI), minimum classification error (MCE) and figure of merit (FOM) criteria. Experiments on the 1996 NIST speaker recognition evaluation data set show that FOM training method outperforms the other two methods for speaker verification in terms of system performance. Meanwhile, logistic regression is investigated and successfully employed as a discriminative score-normalization technique.
Keywords :
normalising; pattern classification; speaker recognition; NIST speaker recognition evaluation data set; discriminability; discriminative training methods; figure of merit; logistic regression; maximum mutual information; minimum classification error; score normalization technique; speaker model; speaker verification; system performance; training utterances; Asia; Bayesian methods; Logistics; Maximum a posteriori estimation; Maximum likelihood estimation; Maximum likelihood linear regression; Mutual information; NIST; Speaker recognition; System performance;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198749