مرکز منطقه ای اطلاع رساني علوم و فناوري - Lightly supervised acoustic model training for imprecisely and asynchronously transcribed speech

DocumentCode :

653726

Title :

Lightly supervised acoustic model training for imprecisely and asynchronously transcribed speech

Author :

Mihajlik, Peter ; Balog, Andras

Author_Institution :

THINKTech Res. Center, Vác, Hungary

fYear :

2013

fDate :

16-19 Oct. 2013

Firstpage :

Lastpage :

Abstract :

In a variety of speech recognition tasks a large amount of approximate transcription is available for the audio material, but is not directly applicable for acoustic model training. Whereas roughly time-synchronous closed-captions or proper audiobook texts are already used in lightly supervised techniques, the utilization of more imperfect and at the same time completely unaligned transcriptions is not self-evident. In this paper we describe our experiments aiming at automated transcription of Hungarian parliamentary speeches. Essentially, a lightly supervised across-domain acoustic model adaptation/retraining is performed. A low-resource broadcast news model is used to bootstrap the process. Relying on automatic recognition of parliamentary training speech and on dynamic text alignment based data selection, a new, task-specific acoustic model is built. For the adaptation to the parliamentary domain, only edited official transcriptions and unaligned speech data are used, without any additional human annotation effort. The adapted acoustic model is applied on unseen target speech in real-time recognition. The word accuracy difference between the automatic and the human powered, official transcription is only 5% (as compared to the exact reference text).

Keywords :

acoustic signal processing; audio signal processing; learning (artificial intelligence); natural language processing; speech recognition; text analysis; asynchronously transcribed speech; audio material; automated Hungarian parliamentary speech transcription; automatic parliamentary training speech recognition; dynamic text alignment based data selection; edited official transcriptions; imprecisely transcribed speech; lightly supervised across-domain acoustic model adaptation; lightly supervised across-domain acoustic model retraining; low-resource broadcast news model; process bootstrapping; real-time recognition; task-specific acoustic model; unaligned speech data; unseen target speech; word accuracy difference; Acoustics; Adaptation models; Data models; Filtering; Speech; Speech recognition; Training; acoustic modeling; cross-domain adaptation; large vocabulary continuous speech recognition; lightly supervised training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Speech Technology and Human - Computer Dialogue (SpeD), 2013 7th Conference on

Conference_Location :

Cluj-Napoca

Type :

conf

DOI :

10.1109/SpeD.2013.6682653

Filename :

6682653

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=653726