DocumentCode :
2875947
Title :
Incremental language modeling for broadcast news
Author :
Ohtsuki, Katsutoshi ; Nguyen, Long
Author_Institution :
NTT Cyber Space Lab., NTT Corp., Kanagawa
fYear :
2005
fDate :
27-27 Nov. 2005
Firstpage :
139
Lastpage :
144
Abstract :
In this paper, we address the task of incremental language modeling for automatic transcription of broadcast news speech. Daily broadcast news naturally contains new words that are not in the lexicon of the speech recognition system but are important for downstream applications such as information retrieval or machine translation. To recognize those new words, the lexicon and the language model of the speech recognition system need to be updated periodically. We propose a method of estimating a list of words to be added to the lexicon based on some time-series text data. The experimental results on the RT04 broadcast news data and other TV audio data showed that this method provided a decent and stable reduction in both out-of-vocabulary rates and speech recognition word error rates
Keywords :
information retrieval; language translation; speech recognition; automatic transcription; broadcast news speech; incremental language modeling; information retrieval; machine translation; speech recognition system; speech recognition word error rates; Broadcasting; Error analysis; Frequency; Information retrieval; Laboratories; Natural languages; Space technology; Speech recognition; Vocabulary; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Conference_Location :
San Juan
Print_ISBN :
0-7803-9478-X
Electronic_ISBN :
0-7803-9479-8
Type :
conf
DOI :
10.1109/ASRU.2005.1566531
Filename :
1566531
Link To Document :
بازگشت