Applying N-best keyword search to continuous speech recognition for telecommunication-based applications

Author

Feng, Ming-Whei

Author_Institution

GTE Labs. Inc., Waltham, MA, USA

fYear

1994

fDate

13-16 Apr 1994

Firstpage

726

Abstract

An N-best keyword search algorithm was developed in a continuous speech recognizer which models vocabulary words as well as extraneous sounds and noise, to achieve high sentence accuracy. The continuous speech recognizer was developed for telecommunication-based applications which typically demand high sentence accuracy. Possible approaches for achieving high sentence accuracy include applying complicated speech modeling techniques or employing more knowledge sources when conducting the recognition search. An alternative solution is to first apply an N-best decoding search to obtain N sentence hypotheses using pre-selected knowledge source(s) and then re-score those hypotheses using other knowledge source(s) or models. The proposed N-best keyword search algorithm derives all keyword sentence hypotheses and the corresponding likelihood scores time-synchronously. We show that the algorithm guarantees to find all sentence hypotheses. To reduce the exponentially growing number of hypotheses, in practical implementation we applied empirically derived thresholds to prune the search. Recognition experiments were conducted on two speech corpora: TI Connected Digit Corpus and Road Rally Corpus, to show the effectiveness of the proposed method

Keywords

decoding; speech analysis and processing; speech coding; speech recognition; vocabulary; N-best decoding search; N-best keyword search; Road Rally Corpus; TI Connected Digit Corpus; continuous speech recognition; empirically derived thresholds; knowledge sources; likelihood scores; recognition search; search algorithm; sentence accuracy; speech modeling techniques; telecommunication-based applications; vocabulary words; Acoustic waves; Hidden Markov models; Keyword search; Laboratories; Maximum likelihood decoding; Probability; Speech enhancement; Speech recognition; Viterbi algorithm; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech, Image Processing and Neural Networks, 1994. Proceedings, ISSIPNN '94., 1994 International Symposium on

Print_ISBN

0-7803-1865-X

Type

conf

DOI

10.1109/SIPNN.1994.344809

Filename

344809