• DocumentCode
    353653
  • Title

    Variable word rate N-grams

  • Author

    Gotoh, Yoshihiko ; Renals, Steve

  • Author_Institution
    Dept. of Comput. Sci., Sheffield Univ., UK
  • Volume
    3
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    1591
  • Abstract
    The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional N-gram language models are usually derived using the assumption of a constant word rate. In this paper we investigate the use of variable word rate assumption, modelled by a Poisson distribution or a continuous mixture of Poissons. We present an approach to estimating the relative frequencies of words or N-grams taking prior information of their occurrences into account. Discounting and smoothing schemes are also considered. Using the Broadcast News task, the approach demonstrates a reduction of perplexity up to 10%
  • Keywords
    Poisson distribution; natural languages; smoothing methods; speech processing; speech recognition; Broadcast News task; Poisson distribution; conventional N-gram language models; discounting schemes; modelling; perplexity reduction; relative frequencies of words; smoothing schemes; variable word rate N-grams; variable word rate assumption; Broadcasting; Computer science; Entropy; Frequency estimation; Information retrieval; Interpolation; Natural languages; Predictive models; Smoothing methods; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-6293-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2000.861992
  • Filename
    861992