• DocumentCode
    376273
  • Title

    Analyzing the properties of smoothing methods for language models

  • Author

    Huang, Feng-Long ; Yu, Ming-shing

  • Author_Institution
    Dept. of Appl. Math., Nat. Chung-Hsing Univ., Taichung, Taiwan
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    512
  • Abstract
    The authors discuss the properties of several frequent smoothing methods for language models. Because of the data sparseness problem, smoothing methods are employed to estimate the probability for each n-gram (including all the seen and unseen events). There are several well-known smoothing methods: Additive discount, Good-Turing, Witten-Bell, Katz and Absolute discount. We propose a set of properties to evaluate the statistical behaviors of these methods. Furthermore, a smoothing scheme, Huang-Yu method, is presented, which complies with all the proposed properties
  • Keywords
    linguistics; natural languages; probability; smoothing methods; Absolute discount; Additive discount; Good-Turing; Huang-Yu method; Katz; Witten-Bell; data sparseness problem; frequent smoothing methods; language models; n-gram; natural language processing; probability; seen events; statistical behaviors; unseen events; Entropy; Mathematical model; Maximum likelihood estimation; Natural language processing; Natural languages; Probability; Smoothing methods; Speech synthesis; Tellurium; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics, 2001 IEEE International Conference on
  • Conference_Location
    Tucson, AZ
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-7087-2
  • Type

    conf

  • DOI
    10.1109/ICSMC.2001.969865
  • Filename
    969865