مرکز منطقه ای اطلاع رساني علوم و فناوري - A variable-length category-based n-gram language model

DocumentCode :

3350716

Title :

A variable-length category-based n-gram language model

Author :

Niesler, T.R. ; Woodland, P.C.

Author_Institution :

Dept. of Eng., Cambridge Univ., UK

Volume :

fYear :

1996

fDate :

7-10 May 1996

Firstpage :

164

Abstract :

A language model based on word-category n-grams and ambiguous category membership with n increased selectively to trade compactness for performance is presented. The use of categories leads intrinsically to a compact model with the ability to generalise to unseen word sequences, and diminishes the sparseness of the training data, thereby making larger n feasible. The language model implicitly involves a statistical tagging operation, which may be used explicitly to assign category assignments to untagged text. Experiments on the LOB corpus show the optimal model-building strategy to yield improved results with respect to conventional n-gram methods, and when used as a tagger, the model is seen to perform well in relation to a standard benchmark

Keywords :

grammars; linguistics; natural languages; speech processing; statistical analysis; ambiguous category membership; category assignments; experiments; optimal model-building strategy; performance; standard benchmark; statistical tagging; training data; untagged text; variable-length n-gram language model; word sequences; word-category n-grams; Context modeling; History; Probability density function; Stochastic processes; Tagging; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on

Conference_Location :

Atlanta, GA

ISSN :

1520-6149

Print_ISBN :

0-7803-3192-3

Type :

conf

DOI :

10.1109/ICASSP.1996.540316

Filename :

540316

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3350716