DocumentCode :
3166795
Title :
Investigations on the use of morpheme level features in Language Models for Arabic LVCSR
Author :
Mousa, Amr El-Desoky ; Schlüter, Ralf ; Ney, Hermann
Author_Institution :
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
5021
Lastpage :
5024
Abstract :
A major challenge for Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) is the rich morphology of Arabic, which leads to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probabilities. In such cases, the use of morphemes rather than full-words is considered a better choice for LMs. Thereby, higher lexical coverage and less LM perplexities are achieved. On the other side, an effective way to increase the robustness of LMs is to incorporate features of words into LMs. In this paper, we investigate the use of features derived for morphemes rather than words. Thus, we combine the benefits of both morpheme level and feature rich modeling. We compare the performance of stream-based, class-based and Factored LMs (FLMs) estimated over sequences of morphemes and their features for performing Arabic LVCSR. A relative reduction of 3.9% in Word Error Rate (WER) is achieved compared to a word-based system.
Keywords :
speech recognition; vocabulary; word processing; FLM; LM probability; OOV rate; WER; arabic LVCSR; arabic large vocabulary continuous speech recognition; class-based LM; factored LM; high out-of-vocabulary rate; higher lexical coverage; language model probability; language models; less LM perplexity; morpheme level features; rich morphology; stream-based LM; word error rate; word-based system; Computational modeling; Humans; Interpolation; Lattices; Mathematical model; Speech recognition; USA Councils; class-based; factored; language model; morpheme; stream-based;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6289048
Filename :
6289048
Link To Document :
بازگشت