مرکز منطقه ای اطلاع رساني علوم و فناوري - MLLR transforms of self-organized units as features in speaker recognition

DocumentCode :

3163477

Title :

MLLR transforms of self-organized units as features in speaker recognition

Author :

Siu, Man-Hung ; Lang, Omer ; Gish, Herbert ; Lowe, Steve ; Chan, Arthur ; Kimball, Owen

Author_Institution :

Raytheon BBN Technol., Cambridge, MA, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4385

Lastpage :

4388

Abstract :

Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.

Keywords :

Gaussian processes; matrix algebra; maximum likelihood estimation; regression analysis; speaker recognition; GMM-based systems; Gaussian mixture model; MLLR transforms; NIST speaker recognition evaluation extended data set; SOU recognizers; SR EER; SR equal error rate; SRE2002 extended data set; acoustic-based SR systems; complementary information; cross-lingual recognizers; matched recognizers; maximum likelihood linear regression adaptation matrices; self-organized unit recognizers; speaker adaptation parameters; speaker recognition; word recognizers; Acoustics; Adaptation models; Hidden Markov models; Speech recognition; Strontium; Support vector machines; Training; self-organized units; speaker recognition; unsupervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288891

Filename :

6288891

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3163477