• DocumentCode
    863255
  • Title

    Target-directed mixture dynamic models for spontaneous speech recognition

  • Author

    Ma, Jeff Z. ; Deng, Li

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Ont., Canada
  • Volume
    12
  • Issue
    1
  • fYear
    2004
  • Firstpage
    47
  • Lastpage
    58
  • Abstract
    In this paper, a novel mixture linear dynamic model (MLDM) for speech recognition is developed and evaluated, where several linear dynamic models are combined (mixed) to represent different vocal-tract-resonance (VTR) dynamic behaviors and the mapping relationships between the VTRs and the acoustic observations. Each linear dynamic model is formulated as the state-space equations, where the VTRs target-directed property is incorporated in the state equation and a linear regression function is used for the observation equation that approximates the nonlinear mapping relationship. A version of the generalized EM algorithm is developed for learning the model parameters, where the constraint that the VTR targets change at the segmental level (rather than at the frame level) is imposed in the parameter learning and model scoring algorithms. Speech recognition experiments are carried out to evaluate the new model using the N-best re-scoring paradigm in a Switchboard task. Compared with a baseline recognizer using the triphone HMM acoustic model, the new recognizer demonstrated improved performance under several experimental conditions. The performance was shown to increase with an increased number of the mixture components in the model.
  • Keywords
    hidden Markov models; parameter estimation; regression analysis; speech processing; speech recognition; state-space methods; N best rescoring paradigm; baseline recognizer; generalized EM algorithm; linear regression function; mixture linear dynamic model; nonlinear mapping; phonetic targets; spontaneous speech recognition; state space equations; target-directed mixture dynamic models; triphone HMM acoustic model; vocal-tract-resonance; Hidden Markov models; Humans; Mathematical model; Nonlinear dynamical systems; Nonlinear equations; Pattern recognition; Production systems; Speech processing; Speech recognition; Video recording;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2003.818074
  • Filename
    1261271