DocumentCode :
1297267
Title :
Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources
Author :
McLaren, Mitchell ; Van Leeuwen, David
Author_Institution :
Centre for Language & Speech Technol., Radboud Univ. Nijmegen, Nijmegen, Netherlands
Volume :
20
Issue :
3
fYear :
2012
fDate :
3/1/2012 12:00:00 AM
Firstpage :
755
Lastpage :
766
Abstract :
The recent development of the i-vector framework for speaker recognition has set a new performance standard in the research field. An i-vector is a compact representation of a speakers utterance extracted from a total variability subspace. Prior to classification using a cosine kernel, i-vectors are projected into an linear discriminant analysis (LDA) space in order to reduce inter-session variability and enhance speaker discrimination. The accurate estimation of this LDA space from a training dataset is crucial to detection performance. A typical training dataset, however, does not consist of utterances acquired through all sources of interest for each speaker. This has the effect of introducing systematic variation related to the speech source in the between-speaker covariance matrix and results in an incomplete representation of the within-speaker scatter matrix used for LDA. The recently proposed source-normalized (SN) LDA algorithm improves the robustness of i-vector-based speaker recognition under both mis-matched evaluation conditions and conditions for which inadequate speech resources are available for suitable system development. When evaluated on the recent NIST 2008 and 2010 Speaker Recognition Evaluations (SRE), SN-LDA demonstrated relative improvements of up to 38% in equal error rate (EER) and 44% in minimum DCF over LDA under mis-matched and sparsely resourced evaluation conditions while also providing improvements in the common telephone-only conditions. Extending on these initial developments, this study provides a thorough analysis of how SN-LDA transforms the i-vector space to reduce source variation and its robustness to varying evaluation and LDA training conditions. The concept of source-normalization is further extended to within-class covariance normalization (WCCN) and data-driven source detection.
Keywords :
covariance matrices; error statistics; speaker recognition; equal error rate; i-vector; intersession variability; linear discriminant analysis; multiple speech sources; robust speaker recognition; source-normalized LDA algorithm; speaker covariance matrix; speaker discrimination; speaker scatter matrix; speaker utterance; Covariance matrix; Kernel; Linear discriminant analysis; NIST; Speaker recognition; Speech; Training; Cross-channel source variation; i-vector; linear discriminant analysis (LDA); speaker recognition; total variability;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2164533
Filename :
5983477
Link To Document :
بازگشت