Investigations on the use of morpheme level features in Language Models for Arabic LVCSR

Author

Mousa, Amr El-Desoky ; Schlüter, Ralf ; Ney, Hermann

Author_Institution

Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany

fYear

2012

fDate

25-30 March 2012

Firstpage

5021

Lastpage

5024

Abstract

A major challenge for Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) is the rich morphology of Arabic, which leads to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probabilities. In such cases, the use of morphemes rather than full-words is considered a better choice for LMs. Thereby, higher lexical coverage and less LM perplexities are achieved. On the other side, an effective way to increase the robustness of LMs is to incorporate features of words into LMs. In this paper, we investigate the use of features derived for morphemes rather than words. Thus, we combine the benefits of both morpheme level and feature rich modeling. We compare the performance of stream-based, class-based and Factored LMs (FLMs) estimated over sequences of morphemes and their features for performing Arabic LVCSR. A relative reduction of 3.9% in Word Error Rate (WER) is achieved compared to a word-based system.

Keywords

speech recognition; vocabulary; word processing; FLM; LM probability; OOV rate; WER; arabic LVCSR; arabic large vocabulary continuous speech recognition; class-based LM; factored LM; high out-of-vocabulary rate; higher lexical coverage; language model probability; language models; less LM perplexity; morpheme level features; rich morphology; stream-based LM; word error rate; word-based system; Computational modeling; Humans; Interpolation; Lattices; Mathematical model; Speech recognition; USA Councils; class-based; factored; language model; morpheme; stream-based;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6289048

Filename

6289048