Title :
Dynamic language modeling for a daily broadcast news transcription system
Author :
Martins, Ciro ; Teixeira, António ; Neto, João
Author_Institution :
Aveiro Univ., Aveiro
Abstract :
When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.
Keywords :
information retrieval; natural language interfaces; speech recognition; broadcast news data; daily adaptation approach; daily broadcast news transcription system; dynamic language modeling; information retrieval engine; morpho-syntatic technique; multi-pass speech recognition process; unsupervised adaptation approach; Automatic speech recognition; Broadcasting; Data mining; Engines; Information retrieval; Natural languages; Speech recognition; Training data; Vocabulary; World Wide Web; Natural language interfaces; Speech recognition;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
DOI :
10.1109/ASRU.2007.4430103