DocumentCode
353653
Title
Variable word rate N-grams
Author
Gotoh, Yoshihiko ; Renals, Steve
Author_Institution
Dept. of Comput. Sci., Sheffield Univ., UK
Volume
3
fYear
2000
fDate
2000
Firstpage
1591
Abstract
The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional N-gram language models are usually derived using the assumption of a constant word rate. In this paper we investigate the use of variable word rate assumption, modelled by a Poisson distribution or a continuous mixture of Poissons. We present an approach to estimating the relative frequencies of words or N-grams taking prior information of their occurrences into account. Discounting and smoothing schemes are also considered. Using the Broadcast News task, the approach demonstrates a reduction of perplexity up to 10%
Keywords
Poisson distribution; natural languages; smoothing methods; speech processing; speech recognition; Broadcast News task; Poisson distribution; conventional N-gram language models; discounting schemes; modelling; perplexity reduction; relative frequencies of words; smoothing schemes; variable word rate N-grams; variable word rate assumption; Broadcasting; Computer science; Entropy; Frequency estimation; Information retrieval; Interpolation; Natural languages; Predictive models; Smoothing methods; Statistics;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location
Istanbul
ISSN
1520-6149
Print_ISBN
0-7803-6293-4
Type
conf
DOI
10.1109/ICASSP.2000.861992
Filename
861992
Link To Document