DocumentCode
37731
Title
Decoupling Word-Pair Distance and Co-occurrence Information for Effective Long History Context Language Modeling
Author
Tze Yuang Chong ; Banchs, Rafael E. ; Eng Siong Chng ; Haizhou Li
Author_Institution
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Volume
23
Issue
7
fYear
2015
fDate
Jul-15
Firstpage
1221
Lastpage
1232
Abstract
In this paper, we propose the use of distance and co-occurrence information of word-pairs to improve language modeling. We have empirically shown that, for history-context sizes of up to ten words, the extracted information about distance and co-occurrence complements the n-gram language model well, for which learning long-history contexts is inherently difficult. Evaluated on the Wall Street Journal and the Switchboard corpora, our proposed model reduces the trigram model perplexity by up to 11.2% and 6.5%, respectively. As compared to the distant bigram model and the trigger model, our proposed model offers a more effective manner of capturing far context information, as verified in terms of perplexity and computational efficiency, i.e., fewer free parameters to be fine-tuned. Experiments using the proposed model for speech recognition, text classification and word prediction tasks showed improved performance.
Keywords
computational complexity; context-sensitive languages; natural language processing; word processing; Switchboard corpora; Wall Street Journal; computational efficiency; distant bigram model; language modeling; long history context language modeling; n-gram language model; trigger model; trigram model perplexity; word-pair distance-cooccurrence information decoupling; Computational modeling; Context; Context modeling; Estimation; IEEE transactions; Speech; Speech processing; Language modeling; speech recognition; text categorization;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2015.2425223
Filename
7091895
Link To Document