• DocumentCode
    3102133
  • Title

    Broadcast news transcription in Central-East European languages

  • Author

    Tarjan, Balazs ; Mozsolics, T. ; Balog, Andras ; Halmos, D. ; Fegyo, Tibor ; Mihajlik, Peter

  • Author_Institution
    THINKTech Research Center, Hungary
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    59
  • Lastpage
    64
  • Abstract
    This paper addresses two main issues. First, how to develop broadcast news transcription systems for Central-East European languages in a short time if only restricted language-specific knowledge is available; and second how to improve an already existing system by using on-line learning method. Accordingly, we present recognition results of two newly developed news transcription systems for Polish and Romanian languages, which are trained in fully data-driven manner based on only a few hours of manual transcriptions and web materials. Besides, an automatic language model updating method is also presented for our Hungarian transcription system. Continuous updating of the language model resulted in 2% relative WER (Word Error Rate) reduction measured on a 3 month long period primarily due to better language model parameter matching for IV (Intra Vocabulary) words and secondary due the reduction of OOV (Out Of Vocabulary) words. To the best of our knowledge, the first Romanian broadcast news recognition results are published in this study.
  • Keywords
    Hungarian; LVCSR; Polish; Romanian; broadcast news; cognitive infocommunication; morphologically rich languages; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cognitive Infocommunications (CogInfoCom), 2012 IEEE 3rd International Conference on
  • Conference_Location
    Kosice, Slovakia
  • Print_ISBN
    978-1-4673-5187-4
  • Electronic_ISBN
    978-1-4673-5186-7
  • Type

    conf

  • DOI
    10.1109/CogInfoCom.2012.6421940
  • Filename
    6421940