• DocumentCode
    390503
  • Title

    Discriminative HMM stream model for Mandarin digit string speech recognition

  • Author

    Shi, Yuan-yuan ; Liu, Jia ; Liu, Run-sheng

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • Volume
    1
  • fYear
    2002
  • fDate
    26-30 Aug. 2002
  • Firstpage
    528
  • Abstract
    The conventional hidden Markov model (HMM) only based on the spectral features does not have a high recognition performance for connected Mandarin digits, because highly confusable syllables exist. The main problems of Mandarin digit recognition are analyzed. It is revealed that to establish the precise classification models for Mandarin digits not only features extracted from the spectrum, energy and pitch contour are necessary but also they should be used with different emphases for different digits. So each-type of feature is used to train a single-stream HMM by maximum likelihood. Then a multi-stream HMM is obtained by combining the single-stream HMMs with exponents that weigh the log-likelihood of each stream. The exponents are estimated by means of the generalized probabilistic descent algorithm according to the digit minimum classification error rate criteria. The superiority of the multi-stream HMM is demonstrated: the relative string error rate is reduced by 54.5%. And the unknown length digit string error rate and its digit error rate decrease to 4.66% and 1.31% respectively.
  • Keywords
    error statistics; hidden Markov models; maximum likelihood estimation; pattern classification; speech recognition; Mandarin digit string speech recognition; classification; confusable syllables; digit minimum classification error rate criteria; discriminative HMM stream model; generalized probabilistic descent algorithm; hidden Markov model; log-likelihood; maximum likelihood; multi-stream HMM; pitch contour; recognition performance; relative string error rate; spectral features; Error analysis; Feature extraction; Hidden Markov models; Humans; Maximum likelihood estimation; Speech recognition; Telephony; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, 2002 6th International Conference on
  • Print_ISBN
    0-7803-7488-6
  • Type

    conf

  • DOI
    10.1109/ICOSP.2002.1181109
  • Filename
    1181109