Title :
Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning
Author :
Wei Rao ; Man-Wai Mak
Author_Institution :
Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong, China
Abstract :
The success of the recent i-vector approach to speaker verification relies on the capability of i-vectors to capture speaker characteristics and the subsequent channel compensation methods to suppress channel variability. Typically, given an utterance, an i-vector is determined from the utterance regardless of its length. This paper investigates how the utterance length affects the discriminative power of i-vectors and demonstrates that the discriminative power of i-vectors reaches a plateau quickly when the utterance length increases. This observation suggests that it is possible to make the best use of a long conversation by partitioning it into a number of sub-utterances so that more i-vectors can be produced for each conversation. To increase the number of sub-utterances without scarifying the representation power of the corresponding i-vectors, repeated applications of frame-index randomization and utterance partitioning are performed. Results on NIST 2010 speaker recognition evaluation (SRE) suggest that (1) using more i-vectors per conversation can help to find more robust linear discriminant analysis (LDA) and within-class covariance normalization (WCCN) transformation matrices, especially when the number of conversations per training speaker is limited; and (2) increasing the number of i-vectors per target speaker helps the i-vector based support vector machines (SVM) to find better decision boundaries, thus making SVM scoring outperforms cosine distance scoring by 19% and 9% in terms of minimum normalized DCF and EER.
Keywords :
covariance matrices; speaker recognition; EER; SVM scoring; channel compensation method; channel variability; cosine distance scoring; frame index randomization; i-vector based speaker verification; i-vector based support vector machines; minimum normalized DCF; robust linear discriminant analysis; speaker characteristics; speaker recognition evaluation; utterance length; utterance partitioning; within class covariance normalization transformation matrices; Acoustics; Interviews; NIST; Speech; Support vector machines; Training; Vectors; I-vectors; linear discriminant analysis; speaker verification; support vector machines; utterance partitioning with acoustic vector resampling (UP-AVR);
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2013.2243436