DocumentCode
179346
Title
Improving language modeling by using distance and co-occurrence information of word-pairs and its application to LVCSR
Author
Tze Yuang Chong ; Banchs, Rafael E. ; Eng Siong Chng ; Haizhou Li
Author_Institution
Temasek Labs., Nanyang Technol. Univ., Singapore, Singapore
fYear
2014
fDate
4-9 May 2014
Firstpage
4883
Lastpage
4887
Abstract
This paper reports our study in exploiting the distance and co-occurrence information of word-pairs to improve the n-gram language model. We used these two types of information for modeling the distant context, up to history length of ten. Also we show that the proposed model provides complementary information about the n-gram´s context that is unable to be captured by the n-gram model due to data scarcity. Evaluated on the WSJ and SWB-1 corpora, the proposed model reduced the trigram perplexity up to 11.2% and 6.5% respectively. In an N-best re-ranking task of the Aurora-4 database, our model aided a hexagram model to perform ~9% relatively better in terms of WER.
Keywords
natural language processing; speech recognition; Aurora-4 database; LVCSR; SWB-1 corpora; WSJ corpora; data scarcity; hexagram model; n-gram language modeling improvement; natural language processing tasks; speech recognition; word-pairs co-occurrence information; word-pairs distance information; Adaptation models; Computational modeling; Context; Context modeling; Hidden Markov models; History; Speech recognition; Term-distance; language model; speech recognition; term-occurrence;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6854530
Filename
6854530
Link To Document