• DocumentCode
    2329972
  • Title

    Multilingual a-stabil: A new confidence score for multilingual unsupervised training

  • Author

    Vu, Ngoc Thang ; Kraus, Franziska ; Schultz, Tanja

  • Author_Institution
    Cognitive Syst. Lab., Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
  • fYear
    2010
  • fDate
    12-15 Dec. 2010
  • Firstpage
    183
  • Lastpage
    188
  • Abstract
    This paper presents our work in Automatic Speech Recognition (ASR) in the context of multilingual unsupervised training with application to Czech. Starting without any transcribed acoustic training data we built a Czech ASR by combining cross-language bootstrapping and confidence based unsupervised training. We present our new method called “multilingual A-stabil” to compute confidence scores and explore the relative effectiveness of acoustic models from more than one language such as Russian, Bulgarian, Polish and Croatian for unsupervised training. While conventional confidence measures such as gamma and A-stabil work well with well-trained acoustic models but have problems with poorly estimated acoustic models, our new method works well in both cases. We describe our multilingual unsupervised training framework which gives very promising results in our experiments. We were able to select 80.5% of the audio training data (18.5 hours) with a transcription WER of 14.5% when using a small amount of untranscribed data (only about 23 hours). The final best WER on Czech is 23.6% on the development set and 22.9% on the evaluation set by using cross-lingual boostrapping, which is very close to the performance of the Czech ASR trained with 23 hours audio data with manual transcriptions (23.1% on the development set and 22.3% on the evaluation set).
  • Keywords
    speech recognition; unsupervised learning; automatic speech recognition; confidence score; cross language bootstrapping; multilingual A-stabil; multilingual unsupervised training; transcribed acoustic training data; confidence score; multilingual ASR; unsupervised training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2010 IEEE
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-7904-7
  • Electronic_ISBN
    978-1-4244-7902-3
  • Type

    conf

  • DOI
    10.1109/SLT.2010.5700848
  • Filename
    5700848