• DocumentCode
    846159
  • Title

    Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model

  • Author

    Ma, Jeff Z. ; Deng, Li

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Ont., Canada
  • Volume
    11
  • Issue
    6
  • fYear
    2003
  • Firstpage
    590
  • Lastpage
    602
  • Abstract
    In this paper, we present two efficient strategies for likelihood computation and decoding in a continuous speech recognizer using an underlying nonlinear state-space dynamic model for the hidden speech dynamics. The state-space model has been specially constructed so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. Two specific decoding algorithms, based on optimal state-sequence estimation for the nonlinear state-space model, are derived, implemented, and evaluated. They successfully overcome the exponential growth in the original search paths by using the path-merging approaches derived from Bayes´ rule. We have tested and compared the two algorithms using the speech data from the Switchboard corpus, confirming their effectiveness. Conversational speech recognition experiments using the Switchboard corpus further demonstrated that the use of the new decoding strategies is capable of reducing the recognizer´s word error rate compared with two baseline recognizers, including the HMM system and the nonlinear state-space model using the HMM-produced phonetic boundaries, under identical test conditions.
  • Keywords
    Bayes methods; Kalman filters; hidden Markov models; maximum likelihood decoding; maximum likelihood sequence estimation; speech coding; speech recognition; state estimation; state-space methods; Bayes rules; HMM systems; Kalman filter; continuous speech recognizer; conversational speech recognition; decoding strategies; exponential growth; hidden Markov model; hidden speech dynamics; likelihood computation; likelihood decoding; nonlinear state-space model; optimal state-sequence estimation; path merging approaches; phonetic reduction; switchboard corpus; Context modeling; Decoding; Error analysis; Hidden Markov models; Mathematical model; Merging; Speech recognition; State estimation; Stochastic processes; System testing;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2003.818075
  • Filename
    1255447