DocumentCode
672377
Title
DNN acoustic modeling with modular multi-lingual feature extraction networks
Author
Gehring, Jonas ; Quoc Bao Nguyen ; Metze, Florian ; Waibel, Alex
Author_Institution
Interactive Syst. Lab., Karlsruhe Inst. of Technol., Karlsruhe, Germany
fYear
2013
fDate
8-12 Dec. 2013
Firstpage
344
Lastpage
349
Abstract
In this work, we propose several deep neural network architectures that are able to leverage data from multiple languages. Modularity is achieved by training networks for extracting high-level features and for estimating phoneme state posteriors separately, and then combining them for decoding in a hybrid DNN/HMM setup. This approach has been shown to achieve superior performance for single-language systems, and here we demonstrate that feature extractors benefit significantly from being trained as multi-lingual networks with shared hidden representations. We also show that existing mono-lingual networks can be re-used in a modular fashion to achieve a similar level of performance without having to train new networks on multi-lingual data. Furthermore, we investigate in extending these architectures to make use of language-specific acoustic features. Evaluations are performed on a low-resource conversational telephone speech transcription task in Vietnamese, while additional data for acoustic model training is provided in Pashto, Tagalog, Turkish, and Cantonese. Improvements of up to 17.4% and 13.8% over mono-lingual GMMs and DNNs, respectively, are obtained.
Keywords
feature extraction; natural language processing; neural nets; speech recognition; Cantonese languge; DNN acoustic modeling; Pashto languge; Tagalog languge; Turkish languge; Vietnamese languge; acoustic model training; deep neural network; language specific acoustic features; modular multilingual feature extraction network; monolingual network; multilingual network; phoneme state posterior; single language system; training network; Acoustics; Adaptation models; Data models; Feature extraction; Hidden Markov models; Neural networks; Training; Deep Neural Networks; Large-Vocabulary Speech Recognition; Low-Resource Acoustic Modeling; Multi-Lingual Acoustic Modeling;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location
Olomouc
Type
conf
DOI
10.1109/ASRU.2013.6707754
Filename
6707754
Link To Document