Classifying words for improved statistical language models

Author

Jelinek, Frederick ; Mercer, Roberi ; Roukos, SaIim

Author_Institution

IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA

fYear

1990

fDate

3-6 Apr 1990

Firstpage

621

Abstract

A method for assigning a word to many classes based on the context in which the word occurs is presented. A trigram language model is used to determine the classes which are called statistical synonyms for that word. This classification method is used to build an adaptive language model that incorporates unknown words after their first occurrence by using their statistical synonyms in determining the model´s probabilities for the added words. It is shown that the dynamic coverage of the language model increases significantly with a rather low perplexity on the added words

Keywords

natural languages; probability; speech recognition; statistical analysis; adaptive language model; probabilities; speech recognition; statistical language models; statistical synonyms; trigram language model; words classification; Context modeling; Electronic mail; Error analysis; Insurance; Natural languages; Probability; Speech recognition; Testing; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on

Conference_Location

Albuquerque, NM

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.1990.115789

Filename

115789

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2902363