مرکز منطقه ای اطلاع رساني علوم و فناوري - Rapid speaker adaptation with speaker adaptive training and non-negative matrix factorization

DocumentCode :

2173741

Title :

Rapid speaker adaptation with speaker adaptive training and non-negative matrix factorization

Author :

Zhang, Xueru ; Demuynck, Kris ; Van hamme, Hugo

Author_Institution :

Dept. of Electr. Eng., Katholieke Univ. Leuven, Leuven, Belgium

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

4456

Lastpage :

4459

Abstract :

In this paper, we describe a novel speaker adaptation algorithm based on Gaussian mixture weight adaptation. A small number of latent speaker vectors are estimated with non-negative matrix factorization (NMF). These base vectors encode the correlations between Gaussian activations as learned from the train data. Expressing the speaker dependent Gaussian mixture weights as a linear combination of a small number of base vectors, reduces the number of parameters that must be estimated from the enrollment data. In order to learn meaningful correlations between Gaussian activations from the train data, the NMF-based weight adaptation was combined with vocal tract length normalization (VTLN) and feature-space maximum likelihood linear regression (fMLLR) based speaker adaptive training based. Evaluation on the 5k closed and 20k open vocabulary Wall Street Journal tasks shows a 4% relative word error rate reduction over the speaker independent recognition system which already incorporates VTLN. The proposed fast adaptation algorithm, using a single enrollment sentence only, results in similar performance as fMLLR adapting on 40 enrollment sentences.

Keywords :

Gaussian processes; maximum likelihood estimation; regression analysis; speaker recognition; Gaussian activations; Gaussian mixture weight adaptation; VTLN; fMLLR; feature-space maximum likelihood linear regression; nonnegative matrix factorization; speaker adaptation; speaker adaptive training; speaker dependent Gaussian mixture weights; vocal tract length normalization; Acoustics; Adaptation models; Data models; Hidden Markov models; Silicon; Speech recognition; Training; Speaker adaptation; maximum likelihood linear regression; non-negative matrix factorization; speaker adaptive training; weight adaptation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5947343

Filename :

5947343

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2173741