Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR

Author

Szarvas, Máté ; Furui, Sadaoki

Author_Institution

Dept. of Comput. Sci., Tokyo Inst. of Technol., Japan

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

This article introduces a novel approach to model morphosyntax in morpheme unit based speech recognizers. The proposed method is evaluated in our recent Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently read sentences selected from a major daily newspaper. The vocabulary units used in the system are morpheme based in order to provide sufficient coverage of the large number of word-forms resulting from affixation and compounding. Besides the standard morpheme N-gram language model we evaluate the novel stochastic morphosyntactic language model (SMLM) that describes the valid word-forms (morpheme combinations) of the language. Thanks to the flexible transducer-based architecture of the system the morphosyntactic component is integrated seamlessly with the basic modules with no need to modify the decoder itself. The proposed stochastic morphosyntactic language model decreases the error rate by 17.9% relatively compared to the baseline trigram system. The morpheme error rate of the best configuration is 14.75% in a 1350 morpheme Hungarian dictation task.

Keywords

grammars; natural languages; speech recognition; stochastic processes; Hungarian LVCSR; WFST paradigm; affixation; baseline trigram system; compounding; daily newspaper; error rate; finite-state transducer based modeling; fluently read sentences recognition; large vocabulary continuous speech recognition; morpheme Hungarian dictation task; morpheme N-gram language model; morpheme combinations; morpheme error rate reduction; morpheme unit based speech recognizers; morphosyntax; stochastic morphosyntactic language model; transducer-based architecture; weighted finite state transducer paradigm; word-forms; Automatic speech recognition; Decoding; Error analysis; Morphology; Natural languages; Speech recognition; Speech synthesis; Stochastic processes; Transducers; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198794

Filename

1198794