• DocumentCode
    3162209
  • Title

    Auto-encoder bottleneck features using deep belief networks

  • Author

    Sainath, Tara N. ; Kingsbury, Brian ; Ramabhadran, Bhuvana

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4153
  • Lastpage
    4156
  • Abstract
    Neural network (NN) bottleneck (BN) features are typically created by training a NN with a middle bottleneck layer. Recently, an alternative structure was proposed which trains a NN with a constant number of hidden units to predict output targets, and then reduces the dimensionality of these output probabilities through an auto-encoder, to create auto-encoder bottleneck (AE-BN) features. The benefit of placing the BN after the posterior estimation network is that it avoids the loss in frame classification accuracy incurred by networks that place the BN before the softmax. In this work, we investigate the use of pre-training when creating AE-BN features. Our experiments indicate that with the AE-BN architecture, pre-trained and deeper NNs produce better AE-BN features. On a 50-hour English Broadcast News task, the AE-BN features provide over a 1% absolute improvement compared to a state-of-the-art GMM/HMM with a WER of 18.8% and pre-trained NN hybrid system with a WER of 18.4%. In addition, on a larger 430-hour Broadcast News task, AE-BN features provide a 0.5% absolute improvement over a strong GMM/HMM baseline with a WER of 16.0%. Finally, system combination with the GMM/HMM baseline and AE-BN systems provides an additional 0.5% absolute on 430 hours over the AE-BN system alone, yielding a final WER of 15.0%.
  • Keywords
    Gaussian distribution; belief networks; estimation theory; hidden Markov models; neural nets; AE-BN architecture; AE-BN features; English Broadcast News task; auto-encoder bottleneck features; deep belief networks; frame classification accuracy; middle bottleneck layer; neural network bottleneck features; posterior estimation network; pre-trained NN hybrid system; softmax; state-of-the-art GMM/HMM; Acoustics; Adaptation models; Artificial neural networks; Feature extraction; Hidden Markov models; Speech; Training; Deep Belief Networks; Speech Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288833
  • Filename
    6288833