DocumentCode :
2781632
Title :
Feature distributions in exponential language models
Author :
Jiang, Huixing ; Wang, Xiaojie
Author_Institution :
Center for Intell. Sci. & Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear :
2009
fDate :
6-8 Nov. 2009
Firstpage :
252
Lastpage :
256
Abstract :
Considering of the features´ distribution but not just the counts of features´ appearances in sequence makes exponential language models more powerful to capture the global language phenomena. This paper constructs an exponential language model with binary variables´ distributions of features, and uses minimum sample risk training method to train model by utilizing more features and adjusting their parameters. In this paper we show that the language model trained on Chinese Internet chat corpus, obtains up to 19% sentence correct rate improvement and up to 7.46% Chinese character correct rate improvement when compared to the baseline model.
Keywords :
natural language processing; Chinese Internet chat corpus; binary variables; exponential language models; feature distributions; minimum sample risk training method; Buildings; Data mining; Internet; Maximum likelihood estimation; Natural language processing; Natural languages; Predictive models; Probability distribution; Smoothing methods; Speech recognition; Exponential language models; binary variable´s distribution; minimum sample risk;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4898-2
Electronic_ISBN :
978-1-4244-4900-6
Type :
conf
DOI :
10.1109/ICNIDC.2009.5360857
Filename :
5360857
Link To Document :
بازگشت