• DocumentCode
    672377
  • Title

    DNN acoustic modeling with modular multi-lingual feature extraction networks

  • Author

    Gehring, Jonas ; Quoc Bao Nguyen ; Metze, Florian ; Waibel, Alex

  • Author_Institution
    Interactive Syst. Lab., Karlsruhe Inst. of Technol., Karlsruhe, Germany
  • fYear
    2013
  • fDate
    8-12 Dec. 2013
  • Firstpage
    344
  • Lastpage
    349
  • Abstract
    In this work, we propose several deep neural network architectures that are able to leverage data from multiple languages. Modularity is achieved by training networks for extracting high-level features and for estimating phoneme state posteriors separately, and then combining them for decoding in a hybrid DNN/HMM setup. This approach has been shown to achieve superior performance for single-language systems, and here we demonstrate that feature extractors benefit significantly from being trained as multi-lingual networks with shared hidden representations. We also show that existing mono-lingual networks can be re-used in a modular fashion to achieve a similar level of performance without having to train new networks on multi-lingual data. Furthermore, we investigate in extending these architectures to make use of language-specific acoustic features. Evaluations are performed on a low-resource conversational telephone speech transcription task in Vietnamese, while additional data for acoustic model training is provided in Pashto, Tagalog, Turkish, and Cantonese. Improvements of up to 17.4% and 13.8% over mono-lingual GMMs and DNNs, respectively, are obtained.
  • Keywords
    feature extraction; natural language processing; neural nets; speech recognition; Cantonese languge; DNN acoustic modeling; Pashto languge; Tagalog languge; Turkish languge; Vietnamese languge; acoustic model training; deep neural network; language specific acoustic features; modular multilingual feature extraction network; monolingual network; multilingual network; phoneme state posterior; single language system; training network; Acoustics; Adaptation models; Data models; Feature extraction; Hidden Markov models; Neural networks; Training; Deep Neural Networks; Large-Vocabulary Speech Recognition; Low-Resource Acoustic Modeling; Multi-Lingual Acoustic Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
  • Conference_Location
    Olomouc
  • Type

    conf

  • DOI
    10.1109/ASRU.2013.6707754
  • Filename
    6707754