• DocumentCode
    2902363
  • Title

    Classifying words for improved statistical language models

  • Author

    Jelinek, Frederick ; Mercer, Roberi ; Roukos, SaIim

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    1990
  • fDate
    3-6 Apr 1990
  • Firstpage
    621
  • Abstract
    A method for assigning a word to many classes based on the context in which the word occurs is presented. A trigram language model is used to determine the classes which are called statistical synonyms for that word. This classification method is used to build an adaptive language model that incorporates unknown words after their first occurrence by using their statistical synonyms in determining the model´s probabilities for the added words. It is shown that the dynamic coverage of the language model increases significantly with a rather low perplexity on the added words
  • Keywords
    natural languages; probability; speech recognition; statistical analysis; adaptive language model; probabilities; speech recognition; statistical language models; statistical synonyms; trigram language model; words classification; Context modeling; Electronic mail; Error analysis; Insurance; Natural languages; Probability; Speech recognition; Testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on
  • Conference_Location
    Albuquerque, NM
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.1990.115789
  • Filename
    115789