• DocumentCode
    302101
  • Title

    Improving n-gram models by incorporating enhanced distributions

  • Author

    Boyle, P.O. ; Ming, J. ; McMahon, J. ; Smith, F.J.

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Queen´´s Univ., Belfast, UK
  • Volume
    1
  • fYear
    1996
  • fDate
    7-10 May 1996
  • Firstpage
    168
  • Abstract
    Two methods of improving conventional n-gram statistical language models are examined. The first involves using a new set of n-gram statistics that attempt to improve the ability of a system to identify phrases correctly. The second involves replacing the maximum likelihood unigram component with an optimised distribution. We test these approaches by incorporating them into weighted average [1] and deleted estimate [2] language models trained on a large newspaper corpus. The improvements lead to a reduction in perplexity of 4.5% and 4.9% respectively for these models
  • Keywords
    estimation theory; natural languages; optimisation; speech recognition; statistical analysis; deleted estimate language model; enhanced distributions; maximum likelihood unigram component; n-gram models; newspaper corpus; optimised distribution; perplexity; phrase identification; statistical language models; weighted average language model; Computer science; Context modeling; Databases; Frequency estimation; Maximum likelihood estimation; Probability distribution; Statistical distributions; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-3192-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1996.540317
  • Filename
    540317