DocumentCode
672386
Title
Deep maxout networks for low-resource speech recognition
Author
Yajie Miao ; Metze, Florian ; Rawat, Seema
Author_Institution
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear
2013
fDate
8-12 Dec. 2013
Firstpage
398
Lastpage
403
Abstract
As a feed-forward architecture, the recently proposed maxout networks integrate dropout naturally and show state-of-the-art results on various computer vision datasets. This paper investigates the application of deep maxout networks (DMNs) to large vocabulary continuous speech recognition (LVCSR) tasks. Our focus is on the particular advantage of DMNs under low-resource conditions with limited transcribed speech. We extend DMNs to hybrid and bottleneck feature systems, and explore optimal network structures (number of maxout layers, pooling strategy, etc) for both setups. On the newly released Babel corpus, behaviors of DMNs are extensively studied under different levels of data availability. Experiments show that DMNs improve low-resource speech recognition significantly. Moreover, DMNs introduce sparsity to their hidden activations and thus can act as sparse feature extractors.
Keywords
computer vision; feature extraction; feedforward; speech recognition; LVCSR tasks; computer vision datasets; deep maxout networks; feedforward architecture; large vocabulary continuous speech recognition; limited transcribed speech; low-resource speech recognition; sparse feature extractors; Acoustics; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training; Training data; Deep maxout networks; deep learning; low-resource conditions; speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location
Olomouc
Type
conf
DOI
10.1109/ASRU.2013.6707763
Filename
6707763
Link To Document