• DocumentCode
    2175106
  • Title

    Automatically finding semantically consistent n-grams to add new words in LVCSR systems

  • Author

    Lecorvé, Gwénolé ; Gravier, Guillaume ; Sébillot, Pascale

  • Author_Institution
    IRISA, Rennes, France
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    4676
  • Lastpage
    4679
  • Abstract
    This paper presents a new method to automatically add re-grams containing out-of-vocabulary (OOV) words to a baseline language model (LM), where these re-grams are sought to be grammatically correct and to make sense according to the meaning of OOV words. First, this method consists in determining the word sequences, i.e., re-grams, in which the usage of a given OOV word is the most semantically consistent. Then, conditional probabilities of these re-grams have to be computed. To do this, semantic relations between words are used to assimilate each OOV word to several equivalent in vocabulary words. Based on these last words, n-grams from the baseline LM are re-used to find the word sequences to be added and to compute their probabilities. After augmenting the vocabulary and launching a recognition process, experiments show that our method results in WER improvements which are comparable to those obtained using a state-of-the-art open vocabulary LM.
  • Keywords
    natural language processing; probability; speech recognition; vocabulary; LVCSR systems; baseline language model; consistent n-grams; large vocabulary continuous speech recognition systems; natural language processing; open vocabulary LM; out-of-vocabulary words; word sequences; Adaptation models; Context; History; Semantics; Speech; Speech recognition; Vocabulary; Automatic speech recognition; language modeling; natural language processing; vocabulary adaptation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947398
  • Filename
    5947398