An N-best candidates-based discriminative training for speech recognition applications

Author

Chen, Jung-kuei ; Soong, Frank K.

Author_Institution

Telecommun. Lab., Minist. of Commun., Chung-Li, Taiwan

Volume

2

Issue

1

fYear

1994

Firstpage

206

Lastpage

216

Abstract

The authors propose an N-best candidates-based discriminative training procedure for constructing high-performance HMM speech recognizers. The algorithm has two distinct features: N-best hypotheses are used for training discriminative models; and a new frame-level loss function is minimized to improve the separation between the correct and incorrect hypotheses. The N-best candidates are decoded based on their recently proposed tree-trellis fast search algorithm. The new frame-level loss function, which is defined as a halfwave rectified log-likelihood difference between the correct and competing hypotheses, is minimized over all training tokens. The minimization is carried out by adjusting the HMM parameters along a gradient descent direction. Two speech recognition applications have been tested, including a speaker independent, small vocabulary (ten Mandarin Chinese digits), continuous speech recognition, and a speaker-trained, large vocabulary (5000 commonly used Chinese words), isolated word recognition. Significant performance improvement over the traditional maximum likelihood trained HMMs has been obtained. In the connected Chinese digit recognition experiment, the string error rate is reduced from 17.0 to 10.8% for unknown length decoding and from 8.2 to 5.2% for known length decoding. In the large vocabulary, isolated word recognition experiment, the recognition error rate is reduced from 7.2 to 3.8%. Additionally, they have found that using more relaxed decoding constraints in preparing N-best hypotheses yields better recognition results.

Keywords

decoding; hidden Markov models; maximum likelihood estimation; minimisation; speech recognition; HMM parameters; HMM speech recognizers; Mandarin Chinese digits; N-best candidates discriminative training; N-best hypotheses; algorithm; connected Chinese digit recognition; continuous speech recognition; discriminative models; frame-level loss function; gradient descent direction; halfwave rectified log-likelihood difference; isolated word recognition; large vocabulary; minimization; small vocabulary; speaker independent recognition; speaker-trained recognition; speech recognition applications; string error rate; training tokens; tree-trellis fast search algorithm; Decoding; Error analysis; Hidden Markov models; Iterative algorithms; Maximum likelihood estimation; Probability distribution; Speech recognition; Testing; Training data; Vocabulary;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.260363

Filename

260363