Title :
Enhanced language modelling with phonologically constrained morphological analysis
Author :
Fang, A.C. ; Huckvale, M.
Author_Institution :
Dept. of Phonetics & Linguistics, Univ. Coll. London, UK
Abstract :
Phonologically constrained morphological analysis (PCMA) is the decomposition of words into their component morphemes conditioned by both orthography and pronunciation. The article describes PCMA and its application in large-vocabulary continuous speech recognition to enhance recognition performance in some tasks. Our experiments, based on the British National Corpus and the LOB Corpus for training data and WSJCAM0 for test data, show clearly that PCMA leads to smaller lexicon size, smaller language models, superior word lattices and a decrease in word error rates. PCMA seems to show most benefit in open-vocabulary tasks, where the productivity of a morph unit lexicon makes a substantial reduction in out-of-vocabulary rates
Keywords :
linguistics; modelling; speech recognition; word processing; British National Corpus; LOB Corpus; PCMA; WSJCAM0; component morphemes; enhanced language modelling; language models; large-vocabulary continuous speech recognition; lexicon size; morph unit lexicon productivity; open-vocabulary tasks; orthography; out-of-vocabulary rates; phonologically constrained morphological analysis; pronunciation; recognition performance; test data; word decomposition; word error rates; word lattices; Educational institutions; Error analysis; Hidden Markov models; Lattices; Speech recognition; Statistical analysis; Statistics; Testing; Training data; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.862081