• DocumentCode
    2175124
  • Title

    Using morpheme and syllable based sub-words for polish LVCSR

  • Author

    Shaik, M. Ali Basha ; El-Desoky Mousa, Amr ; Schlüter, Ralf ; Ney, Hermann

  • Author_Institution
    Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    4680
  • Lastpage
    4683
  • Abstract
    Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high Language Model (LM) perplexities. This poses a challenge for Large Vocabulary and Continuous Speech Recognition (LVCSR) systems. Here, the use of morpheme and syllable based units is investigated for building sub-lexical LMs. A different type of sub-lexical units is proposed based on combining morphemic or syllabic units with corresponding pronunciations. Thereby, a set of grapheme-phoneme pairs called graphones are used for building LMs. A relative reduction of 3.5% in Word Error Rate (WER) is obtained with respect to a traditional system based on full-words.
  • Keywords
    speech recognition; vocabulary; LM perplexities; OOV; WER; building sublexical LMs; continuous speech recognition; high language model perplexities; high out-of-vocabulary; morpheme; polish LVCSR; syllable based subword; word error rate; Adaptation models; Computational modeling; Error analysis; Joints; Speech; Speech recognition; Vocabulary; Polish; graphone; language model; morpheme; syllable;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947399
  • Filename
    5947399