• DocumentCode
    3166795
  • Title

    Investigations on the use of morpheme level features in Language Models for Arabic LVCSR

  • Author

    Mousa, Amr El-Desoky ; Schlüter, Ralf ; Ney, Hermann

  • Author_Institution
    Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5021
  • Lastpage
    5024
  • Abstract
    A major challenge for Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) is the rich morphology of Arabic, which leads to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probabilities. In such cases, the use of morphemes rather than full-words is considered a better choice for LMs. Thereby, higher lexical coverage and less LM perplexities are achieved. On the other side, an effective way to increase the robustness of LMs is to incorporate features of words into LMs. In this paper, we investigate the use of features derived for morphemes rather than words. Thus, we combine the benefits of both morpheme level and feature rich modeling. We compare the performance of stream-based, class-based and Factored LMs (FLMs) estimated over sequences of morphemes and their features for performing Arabic LVCSR. A relative reduction of 3.9% in Word Error Rate (WER) is achieved compared to a word-based system.
  • Keywords
    speech recognition; vocabulary; word processing; FLM; LM probability; OOV rate; WER; arabic LVCSR; arabic large vocabulary continuous speech recognition; class-based LM; factored LM; high out-of-vocabulary rate; higher lexical coverage; language model probability; language models; less LM perplexity; morpheme level features; rich morphology; stream-based LM; word error rate; word-based system; Computational modeling; Humans; Interpolation; Lattices; Mathematical model; Speech recognition; USA Councils; class-based; factored; language model; morpheme; stream-based;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289048
  • Filename
    6289048