مرکز منطقه ای اطلاع رساني علوم و فناوري - Hybrid HMM/BN LVCSR system integrating multiple acoustic features

DocumentCode :

394365

Title :

Hybrid HMM/BN LVCSR system integrating multiple acoustic features

Author :

Markov, Konstantin ; Nakamura, Satoshi

Author_Institution :

ATR Spoken Language Translation Res. Labs., Kyoto, Japan

Volume :

fYear :

2003

fDate :

6-10 April 2003

Abstract :

In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, dynamic Bayesian networks (DBN) allow for easy combination of different features and make use of conditional dependencies between them. However, lack of efficient algorithms has prevented their application in large vocabulary continuous speech recognition. The hybrid HMM/BN acoustic model, where HMM are used for modeling of temporal speech characteristics and state probability model is represented by BN, provides a trade off solution to the problem. In this paper we describe the HMM/BN acoustic model and LVCSR system built upon this model. In the HMM/BN model, in addition to speech observation variable, state BN has two more discrete variables representing speaker gender and pitch frequency. Evaluation results on WSJ database showed lower word error rate with respect to the same complexity conventional HMM acoustic model when there is enough training data to estimate reliable HMM/BN parameters.

Keywords :

acoustic signal processing; belief networks; hidden Markov models; parameter estimation; probability; spectral analysis; speech recognition; HMM acoustic model; HMM based speech recognition systems; HMM/BN parameters estimation; WSJ database; acoustic spectrum features; articulator positions; dynamic Bayesian networks; gender; hybrid HMM/BN LVCSR system; hybrid HMM/BN acoustic model; large vocabulary continuous speech recognition; multiple acoustic features; pitch frequency; speaker gender; speech observation variable; state probability model; temporal speech characteristics; training data; word error rate; Bayesian methods; Cities and towns; Databases; Error analysis; Frequency; Hidden Markov models; Loudspeakers; Speech recognition; Training data; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN :

1520-6149

Print_ISBN :

0-7803-7663-3

Type :

conf

DOI :

10.1109/ICASSP.2003.1198912

Filename :

1198912

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=394365