مرکز منطقه ای اطلاع رساني علوم و فناوري - Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning

DocumentCode :

33669

Title :

Boosting the Performance of I-Vector Based Speaker Verification via Utterance Partitioning

Author :

Wei Rao ; Man-Wai Mak

Author_Institution :

Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Hong Kong, China

Volume :

Issue :

fYear :

2013

fDate :

May-13

Firstpage :

1012

Lastpage :

1022

Abstract :

The success of the recent i-vector approach to speaker verification relies on the capability of i-vectors to capture speaker characteristics and the subsequent channel compensation methods to suppress channel variability. Typically, given an utterance, an i-vector is determined from the utterance regardless of its length. This paper investigates how the utterance length affects the discriminative power of i-vectors and demonstrates that the discriminative power of i-vectors reaches a plateau quickly when the utterance length increases. This observation suggests that it is possible to make the best use of a long conversation by partitioning it into a number of sub-utterances so that more i-vectors can be produced for each conversation. To increase the number of sub-utterances without scarifying the representation power of the corresponding i-vectors, repeated applications of frame-index randomization and utterance partitioning are performed. Results on NIST 2010 speaker recognition evaluation (SRE) suggest that (1) using more i-vectors per conversation can help to find more robust linear discriminant analysis (LDA) and within-class covariance normalization (WCCN) transformation matrices, especially when the number of conversations per training speaker is limited; and (2) increasing the number of i-vectors per target speaker helps the i-vector based support vector machines (SVM) to find better decision boundaries, thus making SVM scoring outperforms cosine distance scoring by 19% and 9% in terms of minimum normalized DCF and EER.

Keywords :

covariance matrices; speaker recognition; EER; SVM scoring; channel compensation method; channel variability; cosine distance scoring; frame index randomization; i-vector based speaker verification; i-vector based support vector machines; minimum normalized DCF; robust linear discriminant analysis; speaker characteristics; speaker recognition evaluation; utterance length; utterance partitioning; within class covariance normalization transformation matrices; Acoustics; Interviews; NIST; Speech; Support vector machines; Training; Vectors; I-vectors; linear discriminant analysis; speaker verification; support vector machines; utterance partitioning with acoustic vector resampling (UP-AVR);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2013.2243436

Filename :

6423258

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=33669