مرکز منطقه ای اطلاع رساني علوم و فناوري - Spoken language mismatch in speaker verification: An investigation with NIST-SRE and CRSS Bi-Ling corpora

DocumentCode :

3585056

Title :

Spoken language mismatch in speaker verification: An investigation with NIST-SRE and CRSS Bi-Ling corpora

Author :

Misra, Abhinav ; Hansen, John H. L.

Author_Institution :

Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas (UTD), Richardson, TX, USA

fYear :

2014

Firstpage :

372

Lastpage :

377

Abstract :

Compensation for mismatch between acoustic conditions in automatic speaker recognition has been widely addressed in recent years. However, performance degradation due to language mismatch has yet to be thoroughly addressed. In this study, we address langauge mismatch for speaker verification. We select bilingual speaker data from the NIST SRE 04-08 corpora and develop train/test-trials for language matched and mismatched conditions. We first show that language variability significantly degrades speaker recognition performance even with a state-of-the-art i-vector system. Next, we consider two ideas to improve performance: i) we introduce small amounts of multi-lingual speech data to the Probabilistic Linear Discriminant Analysis (PLDA) development set, and ii) explore phoneme level analysis to investigate the effect of language mismatch. It is shown that introducing small amounts of multi-lingual seed data within PLDA training has a significant improvement in speaker verification performance. Also, using data from the CRSS Bi-Ling corpus, we show how various phoneme classes affect speaker verification in language mismatch. This speech corpus consists of bilingual speakers who speak either Hindi or Mandarin, in addition to English. Using this corpus, we propose a novel phoneme histogram normalization technique to match the phonetic spaces of two different languages and show a +16.6% relative improvement in speaker verification performance in the presence of language mismatch.

Keywords :

natural language processing; set theory; speaker recognition; statistical analysis; vectors; CRSS Bi-Ling corpora; English; Hindi; Mandarin; NIST SRE 04-08 corpora; PLDA training; acoustic conditions; automatic speaker recognition; bilingual speaker data selection; i-vector system; language variability; multilingual speech data; phoneme histogram normalization technique; phoneme level analysis; phonetic space matching; probabilistic linear discriminant analysis development set; speaker verification; speech utterances; spoken language mismatch; test-trial development; train-trial development; Acoustics; Histograms; NIST; Speaker recognition; Speech; TV; Training; i-vector system; language mismatch; phoneme analysis; speaker verification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language Technology Workshop (SLT), 2014 IEEE

Type :

conf

DOI :

10.1109/SLT.2014.7078603

Filename :

7078603

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3585056