DocumentCode :
653726
Title :
Lightly supervised acoustic model training for imprecisely and asynchronously transcribed speech
Author :
Mihajlik, Peter ; Balog, Andras
Author_Institution :
THINKTech Res. Center, Vác, Hungary
fYear :
2013
fDate :
16-19 Oct. 2013
Firstpage :
1
Lastpage :
5
Abstract :
In a variety of speech recognition tasks a large amount of approximate transcription is available for the audio material, but is not directly applicable for acoustic model training. Whereas roughly time-synchronous closed-captions or proper audiobook texts are already used in lightly supervised techniques, the utilization of more imperfect and at the same time completely unaligned transcriptions is not self-evident. In this paper we describe our experiments aiming at automated transcription of Hungarian parliamentary speeches. Essentially, a lightly supervised across-domain acoustic model adaptation/retraining is performed. A low-resource broadcast news model is used to bootstrap the process. Relying on automatic recognition of parliamentary training speech and on dynamic text alignment based data selection, a new, task-specific acoustic model is built. For the adaptation to the parliamentary domain, only edited official transcriptions and unaligned speech data are used, without any additional human annotation effort. The adapted acoustic model is applied on unseen target speech in real-time recognition. The word accuracy difference between the automatic and the human powered, official transcription is only 5% (as compared to the exact reference text).
Keywords :
acoustic signal processing; audio signal processing; learning (artificial intelligence); natural language processing; speech recognition; text analysis; asynchronously transcribed speech; audio material; automated Hungarian parliamentary speech transcription; automatic parliamentary training speech recognition; dynamic text alignment based data selection; edited official transcriptions; imprecisely transcribed speech; lightly supervised across-domain acoustic model adaptation; lightly supervised across-domain acoustic model retraining; low-resource broadcast news model; process bootstrapping; real-time recognition; task-specific acoustic model; unaligned speech data; unseen target speech; word accuracy difference; Acoustics; Adaptation models; Data models; Filtering; Speech; Speech recognition; Training; acoustic modeling; cross-domain adaptation; large vocabulary continuous speech recognition; lightly supervised training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Technology and Human - Computer Dialogue (SpeD), 2013 7th Conference on
Conference_Location :
Cluj-Napoca
Type :
conf
DOI :
10.1109/SpeD.2013.6682653
Filename :
6682653
Link To Document :
بازگشت