مرکز منطقه ای اطلاع رساني علوم و فناوري - On the estimation of `small´ probabilities by leaving-one-out

DocumentCode :

1101449

Title :

On the estimation of `small´ probabilities by leaving-one-out

Author :

Ney, Hermann ; Essen, Ute ; Kneser, Reinhard

Author_Institution :

Lehrstuhl fur Inf., Tech. Hochschule Aachen, Germany

Volume :

Issue :

fYear :

1995

fDate :

12/1/1995 12:00:00 AM

Firstpage :

1202

Lastpage :

1212

Abstract :

We apply the leaving-one-out concept to the estimation of `small´ probabilities, i.e., the case where the number of training samples is much smaller than the number of possible classes. After deriving the Turing-Good formula in this framework, we introduce several specific models in order to avoid the problems of the original Turing-Good formula. These models are the constrained model, the absolute discounting model and the linear discounting model. These models are then applied to the problem of bigram-based stochastic language modeling. Experimental results are presented for a German and an English corpus

Keywords :

computational linguistics; generalisation (artificial intelligence); learning systems; maximum likelihood estimation; natural languages; probability; English corpus; German corpus; Turing-Good formula; absolute discounting model; bigram-based stochastic language modeling; constrained model; generalisation; leaving-one-out concept; linear discounting model; maximum likelihood estimation; probability; Discrete event simulation; Frequency estimation; Lagrangian functions; Maximum likelihood estimation; Natural languages; Smoothing methods; Stochastic processes; Testing; Training data; Vocabulary;

fLanguage :

English

Journal_Title :

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Publisher :

ieee

ISSN :

0162-8828

Type :

jour

DOI :

10.1109/34.476512

Filename :

476512

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1101449