Title :
On smoothing techniques for bigram-based natural language modelling
Author :
Ney, Hermann ; Essen, Ute
Author_Institution :
Philips GmbH Forschungslab. Aachen, Germany
Abstract :
The authors study various problems related to smoothing bigram probabilities for natural language modeling: the type of interpolation, i.e. linear vs. nonlinear, the optimal estimation of interpolation parameters, and the use of word equivalence classes (parts of speech). A nonlinear interpolation method that results in significant improvements over linear interpolation in the experimental tests is proposed. It is shown that the leaving-one-out method in combination with the maximum likelihood criterion can be efficiently used for the optimal estimation of interpolation parameters. In addition, an automatic clustering procedure is developed for finding word equivalence classes using a maximum likelihood criterion. Experimental results are presented for two text databases: a German database with 100000 words and an English database with 1.1 million words
Keywords :
natural languages; probability; speech analysis and processing; speech recognition; English database; German database; automatic clustering procedure; bigram probabilities; interpolation parameters; leaving-one-out method; linear interpolation; maximum likelihood criterion; natural language modelling; nonlinear interpolation; optimal estimation; parameter estimation; smoothing techniques; speech recognition; text databases; word equivalence classes; Databases; Equations; Error analysis; Interpolation; Maximum likelihood estimation; Natural languages; Parameter estimation; Smoothing methods; Speech recognition; Testing;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on
Conference_Location :
Toronto, Ont.
Print_ISBN :
0-7803-0003-3
DOI :
10.1109/ICASSP.1991.150464