DocumentCode :
36358
Title :
Applying Multi- and Cross-Lingual Stochastic Phone Space Transformations to Non-Native Speech Recognition
Author :
Imseng, David ; Bourlard, Herve ; Dines, John ; Garner, Philip N. ; Magimai-Doss, Mathew
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
Volume :
21
Issue :
8
fYear :
2013
fDate :
Aug. 2013
Firstpage :
1713
Lastpage :
1726
Abstract :
In the context of hybrid HMM/MLP Automatic Speech Recognition (ASR), this paper describes an investigation into a new type of stochastic phone space transformation, which maps “source” phone (or phone HMM state) posterior probabilities (as obtained at the output of a Multilayer Perceptron/MLP) into “destination” phone (HMM phone state) posterior probabilities. The resulting stochastic matrix transformation can be used within the same language to automatically adapt to different phone formats (e.g., IPA) or across languages. Additionally, as shown here, it can also be applied successfully to non-native speech recognition. In the same spirit as MLLR adaptation, or MLP adaptation, the approach proposed here is directly mapping posterior distributions, and is trained by optimizing on a small amount of adaptation data a Kullback-Leibler based cost function, along a modified version of an iterative EM algorithm. On a non-native English database (HIWIRE), and comparing with multiple setups (monophone and triphone mapping, MLLR adaptation) we show that the resulting posterior mapping yields state-of-the-art results using very limited amounts of adaptation data in mono-, cross- and multi-lingual setups. We also show that “universal” phone posteriors, trained on a large amount of multilingual data, can be transformed to English phone posteriors, resulting in an ASR system that significantly outperforms a system trained on English data only. Finally, we demonstrate that the proposed approach outperforms alternative data-driven, as well as a knowledge-based, mapping techniques.
Keywords :
expectation-maximisation algorithm; hidden Markov models; iterative methods; matrix algebra; probability; speech recognition; statistical distributions; ASR system; HIWIRE; Kullback-Leibler based cost function; MLLR adaptation; MLP adaptation; cross-lingual stochastic phone space transformations; destination phone posterior probability; hybrid HMM-MLP automatic speech recognition; iterative EM algorithm; knowledge-based mapping techniques; mapping posterior distributions; multilingual data; multilingual stochastic phone space transformations; nonnative English database; nonnative speech recognition; phone HMM state posterior probability; stochastic matrix transformation; triphone mapping; universal phone posteriors; Non-native speech recognition; multilingual acoustic modeling; universal phone set;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2013.2260150
Filename :
6508849
Link To Document :
بازگشت