• DocumentCode
    36358
  • Title

    Applying Multi- and Cross-Lingual Stochastic Phone Space Transformations to Non-Native Speech Recognition

  • Author

    Imseng, David ; Bourlard, Herve ; Dines, John ; Garner, Philip N. ; Magimai-Doss, Mathew

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • Volume
    21
  • Issue
    8
  • fYear
    2013
  • fDate
    Aug. 2013
  • Firstpage
    1713
  • Lastpage
    1726
  • Abstract
    In the context of hybrid HMM/MLP Automatic Speech Recognition (ASR), this paper describes an investigation into a new type of stochastic phone space transformation, which maps “source” phone (or phone HMM state) posterior probabilities (as obtained at the output of a Multilayer Perceptron/MLP) into “destination” phone (HMM phone state) posterior probabilities. The resulting stochastic matrix transformation can be used within the same language to automatically adapt to different phone formats (e.g., IPA) or across languages. Additionally, as shown here, it can also be applied successfully to non-native speech recognition. In the same spirit as MLLR adaptation, or MLP adaptation, the approach proposed here is directly mapping posterior distributions, and is trained by optimizing on a small amount of adaptation data a Kullback-Leibler based cost function, along a modified version of an iterative EM algorithm. On a non-native English database (HIWIRE), and comparing with multiple setups (monophone and triphone mapping, MLLR adaptation) we show that the resulting posterior mapping yields state-of-the-art results using very limited amounts of adaptation data in mono-, cross- and multi-lingual setups. We also show that “universal” phone posteriors, trained on a large amount of multilingual data, can be transformed to English phone posteriors, resulting in an ASR system that significantly outperforms a system trained on English data only. Finally, we demonstrate that the proposed approach outperforms alternative data-driven, as well as a knowledge-based, mapping techniques.
  • Keywords
    expectation-maximisation algorithm; hidden Markov models; iterative methods; matrix algebra; probability; speech recognition; statistical distributions; ASR system; HIWIRE; Kullback-Leibler based cost function; MLLR adaptation; MLP adaptation; cross-lingual stochastic phone space transformations; destination phone posterior probability; hybrid HMM-MLP automatic speech recognition; iterative EM algorithm; knowledge-based mapping techniques; mapping posterior distributions; multilingual data; multilingual stochastic phone space transformations; nonnative English database; nonnative speech recognition; phone HMM state posterior probability; stochastic matrix transformation; triphone mapping; universal phone posteriors; Non-native speech recognition; multilingual acoustic modeling; universal phone set;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2260150
  • Filename
    6508849