مرکز منطقه ای اطلاع رساني علوم و فناوري - Eigenspace-based MLLR with speaker adaptive training in large vocabulary conversational speech recognition

DocumentCode :

417171

Title :

Eigenspace-based MLLR with speaker adaptive training in large vocabulary conversational speech recognition

Author :

Dounipiotis, V. ; Deng, Yonggang

Author_Institution :

Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA

Volume :

fYear :

2004

fDate :

17-21 May 2004

Abstract :

Speaker adaptive training (SAT), which reduces inter-speaker variability, and eigenspace-based maximum likelihood linear regression (eigenMLLR) adaptation, which takes advantage of prior knowledge about the test speaker´s linear transforms, are combined and developed. During training, SAT generates a set of speaker independent (SI) Gaussian parameters, along with matched speaker dependent transforms for all the speakers in the training set. Then, a set of regression class dependent eigen transforms are derived by doing singular value decomposition (SVD). Normally, during recognition, the test speaker´s linear transforms are obtained with MLLR. In this work, the test speaker´s linear transforms are assumed to be a linear combination of the decomposed eigen transforms. Experimental results conducted on large vocabulary conversational speech recognition (LVCSR) material from the switchboard corpus show that this strategy has better performance than ML-SAT and significantly reduces the number of parameters needed (an 87% reduction is achieved), while still effectively capturing the essential variation between speakers.

Keywords :

eigenvalues and eigenfunctions; learning (artificial intelligence); singular value decomposition; speech recognition; transforms; SVD; eigen transforms; eigenMLLR adaptation; eigenspace-based MLLR; eigenspace-based maximum likelihood linear regression; large vocabulary conversational speech recognition; prior knowledge; singular value decomposition; speaker adaptation; speaker adaptive training; speaker dependent transforms; speaker independent Gaussian parameters; switchboard corpus; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Natural languages; Parameter estimation; Singular value decomposition; Speech processing; Speech recognition; Testing; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN :

1520-6149

Print_ISBN :

0-7803-8484-9

Type :

conf

DOI :

10.1109/ICASSP.2004.1325996

Filename :

1325996

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=417171