DocumentCode
3102133
Title
Broadcast news transcription in Central-East European languages
Author
Tarjan, Balazs ; Mozsolics, T. ; Balog, Andras ; Halmos, D. ; Fegyo, Tibor ; Mihajlik, Peter
Author_Institution
THINKTech Research Center, Hungary
fYear
2012
fDate
2-5 Dec. 2012
Firstpage
59
Lastpage
64
Abstract
This paper addresses two main issues. First, how to develop broadcast news transcription systems for Central-East European languages in a short time if only restricted language-specific knowledge is available; and second how to improve an already existing system by using on-line learning method. Accordingly, we present recognition results of two newly developed news transcription systems for Polish and Romanian languages, which are trained in fully data-driven manner based on only a few hours of manual transcriptions and web materials. Besides, an automatic language model updating method is also presented for our Hungarian transcription system. Continuous updating of the language model resulted in 2% relative WER (Word Error Rate) reduction measured on a 3 month long period primarily due to better language model parameter matching for IV (Intra Vocabulary) words and secondary due the reduction of OOV (Out Of Vocabulary) words. To the best of our knowledge, the first Romanian broadcast news recognition results are published in this study.
Keywords
Hungarian; LVCSR; Polish; Romanian; broadcast news; cognitive infocommunication; morphologically rich languages; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Cognitive Infocommunications (CogInfoCom), 2012 IEEE 3rd International Conference on
Conference_Location
Kosice, Slovakia
Print_ISBN
978-1-4673-5187-4
Electronic_ISBN
978-1-4673-5186-7
Type
conf
DOI
10.1109/CogInfoCom.2012.6421940
Filename
6421940
Link To Document