• DocumentCode
    3328771
  • Title

    Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition

  • Author

    Yazgan, Ali ; Saraclar, Murat

  • Author_Institution
    Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    In this paper, we propose a method for out-of-vocabulary (OOV) word detection and take a step toward open vocabulary automatic speech recognition. The proposed method uses a hybrid language model combining words and subword units such as phones or syllables. We describe a detection algorithm based on the posterior count of the OOV words given the hybrid model, and compare it to using the posterior probability of the best word string given a conventional word only model. Experimental results on the Switchboard corpus are presented for different vocabulary sizes. The new method yields a gain of over 10% in OOV word detection. In addition, a modest number of the OOV word pronunciations are found correctly.
  • Keywords
    speech processing; speech recognition; vocabulary; OOV words; Switchboard corpus; hybrid language models; large vocabulary conversational speech recognition; open vocabulary automatic speech recognition; out of vocabulary word detection; phones; posterior count; subword units; syllables; word pronunciations; Automatic speech recognition; Broadcasting; Detection algorithms; Error analysis; Machine assisted indexing; Natural languages; Runtime; Speech processing; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326093
  • Filename
    1326093