Title :
Feature distributions in exponential language models
Author :
Jiang, Huixing ; Wang, Xiaojie
Author_Institution :
Center for Intell. Sci. & Technol., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Considering of the features´ distribution but not just the counts of features´ appearances in sequence makes exponential language models more powerful to capture the global language phenomena. This paper constructs an exponential language model with binary variables´ distributions of features, and uses minimum sample risk training method to train model by utilizing more features and adjusting their parameters. In this paper we show that the language model trained on Chinese Internet chat corpus, obtains up to 19% sentence correct rate improvement and up to 7.46% Chinese character correct rate improvement when compared to the baseline model.
Keywords :
natural language processing; Chinese Internet chat corpus; binary variables; exponential language models; feature distributions; minimum sample risk training method; Buildings; Data mining; Internet; Maximum likelihood estimation; Natural language processing; Natural languages; Predictive models; Probability distribution; Smoothing methods; Speech recognition; Exponential language models; binary variable´s distribution; minimum sample risk;
Conference_Titel :
Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4898-2
Electronic_ISBN :
978-1-4244-4900-6
DOI :
10.1109/ICNIDC.2009.5360857