Title :
Performance of LVCSR with morpheme-based and syllable-based recognition units
Author_Institution :
ETRI, Taejon, South Korea
Abstract :
For large vocabulary continuous speech recognition of highly inflected languages, it is the first step to determine an appropriate speech recognition unit to reduce high out-of-vocabulary rate. We investigate two kinds of approaches to select recognition units. In the morpheme-based approach, we use morpheme as basic recognition unit and merge frequent morpheme pairs into phrases by rule-based method or statistical unit merging method. In statistical unit merging, we investigate the effects of part-of-speech constraints used in selecting merging candidates. In the syllable-based approach, assuming that only text data and pronunciation are available, we obtain merged syllables by using the same statistical merging method where pronunciation variation is taken into account. The experimental results showed that the statistical merging method with appropriate linguistic constraints yields best recognition accuracy. Although the syllable-based approach did not show comparable performance, it has the advantage that it does not require a part-of-speech tagging system
Keywords :
speech recognition; statistical analysis; vocabulary; LVCSR; frequent morpheme pairs; high out-of-vocabulary rate; highly inflected languages; large vocabulary continuous speech recognition; linguistic constraints; merged syllables; morpheme-based recognition units; part-of-speech constraints; part-of-speech tagging system; performance; pronunciation; recognition accuracy; rule-based method; statistical unit merging method; syllable-based recognition units; text data; Broadcasting; Databases; Dictionaries; Error analysis; Merging; Natural languages; Speech recognition; Tagging; Testing; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.861974