• DocumentCode
    1020920
  • Title

    An N-best candidates-based discriminative training for speech recognition applications

  • Author

    Chen, Jung-kuei ; Soong, Frank K.

  • Author_Institution
    Telecommun. Lab., Minist. of Commun., Chung-Li, Taiwan
  • Volume
    2
  • Issue
    1
  • fYear
    1994
  • Firstpage
    206
  • Lastpage
    216
  • Abstract
    The authors propose an N-best candidates-based discriminative training procedure for constructing high-performance HMM speech recognizers. The algorithm has two distinct features: N-best hypotheses are used for training discriminative models; and a new frame-level loss function is minimized to improve the separation between the correct and incorrect hypotheses. The N-best candidates are decoded based on their recently proposed tree-trellis fast search algorithm. The new frame-level loss function, which is defined as a halfwave rectified log-likelihood difference between the correct and competing hypotheses, is minimized over all training tokens. The minimization is carried out by adjusting the HMM parameters along a gradient descent direction. Two speech recognition applications have been tested, including a speaker independent, small vocabulary (ten Mandarin Chinese digits), continuous speech recognition, and a speaker-trained, large vocabulary (5000 commonly used Chinese words), isolated word recognition. Significant performance improvement over the traditional maximum likelihood trained HMMs has been obtained. In the connected Chinese digit recognition experiment, the string error rate is reduced from 17.0 to 10.8% for unknown length decoding and from 8.2 to 5.2% for known length decoding. In the large vocabulary, isolated word recognition experiment, the recognition error rate is reduced from 7.2 to 3.8%. Additionally, they have found that using more relaxed decoding constraints in preparing N-best hypotheses yields better recognition results.
  • Keywords
    decoding; hidden Markov models; maximum likelihood estimation; minimisation; speech recognition; HMM parameters; HMM speech recognizers; Mandarin Chinese digits; N-best candidates discriminative training; N-best hypotheses; algorithm; connected Chinese digit recognition; continuous speech recognition; discriminative models; frame-level loss function; gradient descent direction; halfwave rectified log-likelihood difference; isolated word recognition; large vocabulary; minimization; small vocabulary; speaker independent recognition; speaker-trained recognition; speech recognition applications; string error rate; training tokens; tree-trellis fast search algorithm; Decoding; Error analysis; Hidden Markov models; Iterative algorithms; Maximum likelihood estimation; Probability distribution; Speech recognition; Testing; Training data; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.260363
  • Filename
    260363