• DocumentCode
    835985
  • Title

    Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages With Application to Dialectal Arabic

  • Author

    Sarikaya, Ruhi ; Afify, Mohamed ; Deng, Yonggang ; Erdogan, Hakan ; Gao, Yuqing

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY
  • Volume
    16
  • Issue
    7
  • fYear
    2008
  • Firstpage
    1330
  • Lastpage
    1339
  • Abstract
    Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve.
  • Keywords
    language translation; speech recognition; statistical analysis; Arabic morphology; dialectal Arabic; joint morphological-lexical language modeling; language model parameter estimation; machine translation; morphological segments; morphologically rich languages; out-of-vocabulary rate; rich morphology; smooth probability estimation; speech recognition; trigram language models; Entropy; Information technology; Morphology; Natural language processing; Natural languages; Parameter estimation; Predictive models; Robustness; Speech recognition; Vocabulary; Joint modeling; language modeling; maximum entropy modeling; morphological analysis;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2008.924591
  • Filename
    4599398