مرکز منطقه ای اطلاع رساني علوم و فناوري - Language Modeling with the Maximum Likelihood Set: Complexity Issues and the Back-off Formula

DocumentCode :

2950379

Title :

Language Modeling with the Maximum Likelihood Set: Complexity Issues and the Back-off Formula

Author :

Karakos, Damianos ; Khudanpur, Sanjeev

Author_Institution :

Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD

fYear :

2006

fDate :

9-14 July 2006

Firstpage :

2814

Lastpage :

2818

Abstract :

The maximum likelihood set (MLS) was recently introduced in B. Jedynak and S. Khudanpur (2005) as an effective, parameter-free technique for estimating a probability mass function (pmf) from sparse data. The MLS contains all pmfs that assign merely a higher likelihood to the observed counts than to any other set of counts, for the same sample size. In this paper, the MLS is extended to the case of conditional pmf estimation. First, it is shown that, when the criterion for selecting a pmf from the MLS is the KL-divergence, the selected conditional pmf naturally has a back-off form, except for a ceiling on the probability of high frequency symbols that are not seen in particular contexts. Second, the pmf has a sparse parameterization, leading to efficient algorithms for KL-divergence minimization. Experimental results from bigram and trigram language modeling indicate that pmfs selected from the MLS are competitive with state-of-the-art estimates

Keywords :

computational complexity; maximum likelihood estimation; minimisation; natural languages; probability; KL-divergence minimization; back-off formula; language modeling; maximum likelihood set; probability mass function; sparse parameterization; Bayesian methods; Entropy; Frequency; Maximum likelihood estimation; Minimization methods; Multilevel systems; Natural languages; Parameter estimation; Random variables; Speech processing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Theory, 2006 IEEE International Symposium on

Conference_Location :

Seattle, WA

Print_ISBN :

1-4244-0505-X

Electronic_ISBN :

1-4244-0504-1

Type :

conf

DOI :

10.1109/ISIT.2006.261575

Filename :

4036486

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2950379