• DocumentCode
    179346
  • Title

    Improving language modeling by using distance and co-occurrence information of word-pairs and its application to LVCSR

  • Author

    Tze Yuang Chong ; Banchs, Rafael E. ; Eng Siong Chng ; Haizhou Li

  • Author_Institution
    Temasek Labs., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    4883
  • Lastpage
    4887
  • Abstract
    This paper reports our study in exploiting the distance and co-occurrence information of word-pairs to improve the n-gram language model. We used these two types of information for modeling the distant context, up to history length of ten. Also we show that the proposed model provides complementary information about the n-gram´s context that is unable to be captured by the n-gram model due to data scarcity. Evaluated on the WSJ and SWB-1 corpora, the proposed model reduced the trigram perplexity up to 11.2% and 6.5% respectively. In an N-best re-ranking task of the Aurora-4 database, our model aided a hexagram model to perform ~9% relatively better in terms of WER.
  • Keywords
    natural language processing; speech recognition; Aurora-4 database; LVCSR; SWB-1 corpora; WSJ corpora; data scarcity; hexagram model; n-gram language modeling improvement; natural language processing tasks; speech recognition; word-pairs co-occurrence information; word-pairs distance information; Adaptation models; Computational modeling; Context; Context modeling; Hidden Markov models; History; Speech recognition; Term-distance; language model; speech recognition; term-occurrence;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854530
  • Filename
    6854530