مرکز منطقه ای اطلاع رساني علوم و فناوري - Stochastic trajectory model with state-mixture for continuous speech recognition

DocumentCode :

2253798

Title :

Stochastic trajectory model with state-mixture for continuous speech recognition

Author :

Illina, Irina ; Gong, Yifan

Author_Institution :

Inst. Nat. de Recherche en Inf. et Autom., Vandoeuvre-les-Nancy, France

Volume :

fYear :

1996

fDate :

3-6 Oct 1996

Firstpage :

342

Abstract :

The problem of acoustic modeling for continuous speech recognition is addressed. To deal with coarticulation effects and interspeaker variability, an extension of the mixture stochastic trajectory model (MSTM) is proposed. MSTM is a segment-based model using phonemes as speech units. In MSTM, the observations of a phoneme are modeled by a set of stochastic trajectories. The trajectories are modeled by a mixture of probability density functions (pdf) of state sequences. Each state is associated with a multivariate Gaussian density function. We propose to replace the state single Gaussian pdf by a mixture of Gaussian pdfs (MSTM with state-mixture, SM-MSTM). The parameters of the model are estimated under the ML criterion, using the expectation-maximisation (EM) algorithm. The tests of the system on a speaker-dependent continuous speech recognition task show a reduction in the word error rate by about 15% over the baseline MSTM, even for an equal number of parameters. Experiments based on a multispeaker continuous speech recognition task do not lead to significant improvement over the baseline system

Keywords :

Gaussian processes; errors; maximum likelihood estimation; parameter estimation; probability; speech recognition; stochastic processes; acoustic modeling; coarticulation effects; continuous speech recognition; expectation-maximisation algorithm; interspeaker variability; mixture stochastic trajectory model; multispeaker continuous speech recognition; multivariate Gaussian density function; parameter estimation; phonemes; probability density functions; segment-based model; speaker-dependent continuous speech recognition; state-mixture; stochastic trajectories; word error rate; Context modeling; Density functional theory; Hidden Markov models; Loudspeakers; Maximum likelihood estimation; Mesons; Polynomials; Speech recognition; Stochastic processes; System testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location :

Philadelphia, PA

Print_ISBN :

0-7803-3555-4

Type :

conf

DOI :

10.1109/ICSLP.1996.607124

Filename :

607124

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2253798