Title :
Sentence-Adapted Factored Language Model for Transcribing Estonian Speech
Author_Institution :
Dept. of Phonetics & Speech Technol., Tallinn Univ. of Technol.
Abstract :
This work presents a 2-pass recognition method for highly inflected agglutinative languages based on an Estonian large vocabulary recognition task. Morphemes are used as basic recognition units in a standard trigram language model in the first pass. The recognized morphemes are reconstructed back to words using hidden event language model for compound word detection. In the second pass, the vocabulary from N-best sentence candidates from the first pass is used to create an adaptive sentence-specific word-based language model which is applied for rescoring the N-best hypotheses. The sentence specific language model is based on the factored language model paradigm and estimates word probabilities based on the preceding two words and part-of-speech tags. The method achieves a 7.3% relative word error rate improvement over the baseline system that is used in the first pass
Keywords :
error statistics; natural languages; speech recognition; word processing; 2-pass recognition method; Estonian large vocabulary recognition task; Estonian speech; N-best sentence candidates; adaptive sentence-specific word-based language model; compound word detection; hidden event language model; highly inflected agglutinative languages; morpheme recognition; part-of-speech tags; sentence-adapted factored language model; trigram language model; word error rate improvement; word probabilities; Cybernetics; Decoding; Error analysis; Event detection; Information technology; Natural languages; Probability; Speech recognition; Training data; Vocabulary;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
Print_ISBN :
1-4244-0469-X
DOI :
10.1109/ICASSP.2006.1660049