JFA modeling with left-to-right structure and a new backend for text-dependent speaker recognition

Author

Kenny, Patrick ; Stafylakis, Themos ; Alam, Jahangir ; Kockmann, Marcel

Author_Institution

Centre de Rech. Inf. de Montreal (CRIM), Montreal, QC, Canada

fYear

2015

fDate

19-24 April 2015

Firstpage

4689

Lastpage

4693

Abstract

This paper introduces a new formulation of Joint Factor Analysis (JFA) for text-dependent speaker recognition based on left-to-right modeling with tied mixture HMMs. It accommodates many different ways of extracting multiple features to characterize speakers (features may or may not be HMM state-dependent, they may be modeled with subspace or factorial priors and these priors maybe imputed from text-dependent or text-independent background data). We feed these features to a new, trainable classifier for text-dependent speaker recognition in a manner which is broadly analogous to the i-vector/PLDA cascade in text-independent speaker recognition. We have evaluated this approach on a challenging proprietary dataset consisting of telephone recordings of short English and Urdu pass-phrases collected in Pakistan. By fusing results obtained with multiple front ends, equal error rate of around 2% are achievable.

Keywords

feature extraction; hidden Markov models; mixture models; speaker recognition; JFA modeling; Pakistan; Urdu pass phrases; joint factor analysis; left-to-right structure; mixture hidden Markov model; multiple feature extraction; short English; telephone recordings; text-dependent speaker recognition backend; trainable classifier; Adaptation models; Data models; Feature extraction; Hidden Markov models; Joints; Speaker recognition; Xenon; Joint Factor Analysis; text-dependent speaker recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178860

Filename

7178860