مرکز منطقه ای اطلاع رساني علوم و فناوري - Bayesian Speaker Adaptation Based on a New Hierarchical Probabilistic Model

DocumentCode :

1483608

Title :

Bayesian Speaker Adaptation Based on a New Hierarchical Probabilistic Model

Author :

Zhang, Wen-Lin ; Zhang, Wei-Qiang ; Li, Bi-Cheng ; Qu, Dan ; Johnson, Michael T.

Author_Institution :

Dept. of Inf. Sci., Zhengzhou Inf. Sci. & Technol. Inst., Zhengzhou, China

Volume :

Issue :

fYear :

2012

Firstpage :

2002

Lastpage :

2015

Abstract :

In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range of adaptation conditions. The new method efficiently utilizes intra-speaker and inter-speaker correlation information through modeling phone and speaker subspaces in a consistent hierarchical Bayesian way. The phone variations for a specific speaker are assumed to be located in a low-dimensional subspace. The phone coordinate, which is shared among different speakers, implicitly contains the intra-speaker correlation information. For a specific speaker, the phone variation, represented by speaker-dependent eigenphones, are concatenated into a supervector. The eigenphone supervector space is also a low dimensional speaker subspace, which contains inter-speaker correlation information. Using principal component analysis (PCA), a new hierarchical probabilistic model for the generation of the speech observations is obtained. Speaker adaptation based on the new hierarchical model is derived using the maximum a posteriori criterion in a top-down manner. Both batch adaptation and online adaptation schemes are proposed. With tuned parameters, the new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results on a Mandarin Chinese continuous speech recognition task show good performance under all testing conditions.

Keywords :

Bayes methods; eigenvalues and eigenfunctions; maximum likelihood estimation; natural language processing; principal component analysis; regression analysis; speaker recognition; Bayesian speaker adaptation; HMAP; MLLR; Mandarin Chinese continuous speech recognition; PCA; eigenphone supervector space; eigenvoice; hierarchical probabilistic model; interspeaker correlation information; intraspeaker correlation information; low dimensional speaker subspace; maximum a posteriori; maximum-likelihood linear regression; phone coordinate; phone subspace; principal component analysis; speaker-dependent eigenphone; Adaptation models; Correlation; Hidden Markov models; Principal component analysis; Probabilistic logic; Training; Vectors; Eigenphones; eigenvoices; hierarchical model; maximum a posteriori (MAP); speaker adaptation;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2012.2193390

Filename :

6178005

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1483608