مرکز منطقه ای اطلاع رساني علوم و فناوري - Discriminative HMM stream model for Mandarin digit string speech recognition

DocumentCode :

390503

Title :

Discriminative HMM stream model for Mandarin digit string speech recognition

Author :

Shi, Yuan-yuan ; Liu, Jia ; Liu, Run-sheng

Author_Institution :

Dept. of Electron. Eng., Tsinghua Univ., Beijing, China

Volume :

fYear :

2002

fDate :

26-30 Aug. 2002

Firstpage :

528

Abstract :

The conventional hidden Markov model (HMM) only based on the spectral features does not have a high recognition performance for connected Mandarin digits, because highly confusable syllables exist. The main problems of Mandarin digit recognition are analyzed. It is revealed that to establish the precise classification models for Mandarin digits not only features extracted from the spectrum, energy and pitch contour are necessary but also they should be used with different emphases for different digits. So each-type of feature is used to train a single-stream HMM by maximum likelihood. Then a multi-stream HMM is obtained by combining the single-stream HMMs with exponents that weigh the log-likelihood of each stream. The exponents are estimated by means of the generalized probabilistic descent algorithm according to the digit minimum classification error rate criteria. The superiority of the multi-stream HMM is demonstrated: the relative string error rate is reduced by 54.5%. And the unknown length digit string error rate and its digit error rate decrease to 4.66% and 1.31% respectively.

Keywords :

error statistics; hidden Markov models; maximum likelihood estimation; pattern classification; speech recognition; Mandarin digit string speech recognition; classification; confusable syllables; digit minimum classification error rate criteria; discriminative HMM stream model; generalized probabilistic descent algorithm; hidden Markov model; log-likelihood; maximum likelihood; multi-stream HMM; pitch contour; recognition performance; relative string error rate; spectral features; Error analysis; Feature extraction; Hidden Markov models; Humans; Maximum likelihood estimation; Speech recognition; Telephony; Viterbi algorithm;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing, 2002 6th International Conference on

Print_ISBN :

0-7803-7488-6

Type :

conf

DOI :

10.1109/ICOSP.2002.1181109

Filename :

1181109

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=390503