Generalized optimization algorithm for speech recognition transducers

Author

Allauzen, Cyril ; Mohri, Mehryar

Author_Institution

AT&T Labs.-Res., USA

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

Weighted transducers provide a common representation for the components of a speech recognition system. In previous work, we showed that these components can be combined off-line into a single compact recognition transducer that maps directly HMM state sequences to word sequences. The construction of that recognition transducer and its efficiency of use critically depend on the use of a general optimization algorithm, determinization. However, not all weighted automata and transducers used in large-vocabulary speech recognition are determinizable. We present a general algorithm that can make an arbitrary weighted transducer determinizable and generalize our previous optimization technique for building an integrated recognition transducer to deal with arbitrary weighted transducers used in speech recognition. We report experimental results in a large- vocabulary speech recognition task, How May I Help You (HMIHY), showing that our generalized technique leads to a recognition transducer that performs as well as our original solution in the case of classical n-gram models while inserting less special symbols, and that it leads to a substantial improvement of the recognition speed, factor of 2.6, in the same task when using a class-based language model.

Keywords

grammars; hidden Markov models; speech recognition; transducers; HMM word sequences; class-based language model; compact recognition transducer; determinization; general optimization algorithm; generalized optimization algorithm; integrated recognition transducer; large-vocabulary speech recognition; n-gram models; original solution; recognition speed; speech recognition system; speech recognition transducers; state sequences; weighted automata; weighted transducers; Automata; Automatic speech recognition; Context modeling; Dictionaries; Helium; Hidden Markov models; Natural languages; Speech recognition; Transducers; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198790

Filename

1198790