Title :
Efficient multi-lingual unsupervised acoustic model training under mismatch conditions
Author :
Saiko, Masahiro ; Yamamoto, Hitoshi ; Isotani, Ryosuke ; Hori, Chiori
Author_Institution :
Spoken Language Commun. Lab., Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
Abstract :
We propose a new multi-lingual unsupervised acoustic model (AM) training method for low-resourced languages under mismatch conditions. In those languages, there is very limited or no transcribed speech. Thus, unsupervised acoustic modeling using AMs of different languages (not low-resourced languages) has been proposed. The conventional method has shown to be effective for similar acoustic conditions, such as speaking-style, between a low-resourced language and different languages. However, since it is not easy to prepare the matched AMs of different languages, mismatch problem between each AM and the speech of a low-resourced language for unsupervised acoustic modeling is practically occurred. In this paper, we deal with this mismatch problem. To generate more accurate automatic transcriptions under mismatch conditions, we introduce two things: (1) Initial AMs were trained with speech of different languages that was mapped to the phonemes of a low-resourced language and (2) Iterative process to switch back and forth between training of AMs and adaptation of the initial AMs. The proposed method without any transcriptions achieved a word error rate of 32.1% on the evaluation set of IWSLT2011, while the word error rates of the conventional method and the supervised training method were 39.3 and 22.7%, respectively.
Keywords :
acoustic signal processing; natural languages; speech processing; IWSLT2011; automatic transcriptions; iterative process; low-resourced languages; mismatch conditions; mismatch problem; multilingual unsupervised acoustic model; phonemes; supervised training method; unsupervised AM training method; word error rate; Acoustics; Adaptation models; Speech; Speech processing; Speech recognition; Training; Training data; Acoustic modeling; Low-resourced language; Multilingual speech processing; Unsupervised training;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
DOI :
10.1109/SLT.2014.7078544