DocumentCode :
2768941
Title :
Dynamic language modeling for a daily broadcast news transcription system
Author :
Martins, Ciro ; Teixeira, António ; Neto, João
Author_Institution :
Aveiro Univ., Aveiro
fYear :
2007
fDate :
9-13 Dec. 2007
Firstpage :
165
Lastpage :
170
Abstract :
When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.
Keywords :
information retrieval; natural language interfaces; speech recognition; broadcast news data; daily adaptation approach; daily broadcast news transcription system; dynamic language modeling; information retrieval engine; morpho-syntatic technique; multi-pass speech recognition process; unsupervised adaptation approach; Automatic speech recognition; Broadcasting; Data mining; Engines; Information retrieval; Natural languages; Speech recognition; Training data; Vocabulary; World Wide Web; Natural language interfaces; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
Type :
conf
DOI :
10.1109/ASRU.2007.4430103
Filename :
4430103
Link To Document :
بازگشت