DocumentCode
390503
Title
Discriminative HMM stream model for Mandarin digit string speech recognition
Author
Shi, Yuan-yuan ; Liu, Jia ; Liu, Run-sheng
Author_Institution
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Volume
1
fYear
2002
fDate
26-30 Aug. 2002
Firstpage
528
Abstract
The conventional hidden Markov model (HMM) only based on the spectral features does not have a high recognition performance for connected Mandarin digits, because highly confusable syllables exist. The main problems of Mandarin digit recognition are analyzed. It is revealed that to establish the precise classification models for Mandarin digits not only features extracted from the spectrum, energy and pitch contour are necessary but also they should be used with different emphases for different digits. So each-type of feature is used to train a single-stream HMM by maximum likelihood. Then a multi-stream HMM is obtained by combining the single-stream HMMs with exponents that weigh the log-likelihood of each stream. The exponents are estimated by means of the generalized probabilistic descent algorithm according to the digit minimum classification error rate criteria. The superiority of the multi-stream HMM is demonstrated: the relative string error rate is reduced by 54.5%. And the unknown length digit string error rate and its digit error rate decrease to 4.66% and 1.31% respectively.
Keywords
error statistics; hidden Markov models; maximum likelihood estimation; pattern classification; speech recognition; Mandarin digit string speech recognition; classification; confusable syllables; digit minimum classification error rate criteria; discriminative HMM stream model; generalized probabilistic descent algorithm; hidden Markov model; log-likelihood; maximum likelihood; multi-stream HMM; pitch contour; recognition performance; relative string error rate; spectral features; Error analysis; Feature extraction; Hidden Markov models; Humans; Maximum likelihood estimation; Speech recognition; Telephony; Viterbi algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing, 2002 6th International Conference on
Print_ISBN
0-7803-7488-6
Type
conf
DOI
10.1109/ICOSP.2002.1181109
Filename
1181109
Link To Document