• DocumentCode
    67959
  • Title

    The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition

  • Author

    Yu, Dong ; Deng, Li ; Seide, Frank

  • Author_Institution
    Microsoft Res., Redmond, WA, USA
  • Volume
    21
  • Issue
    2
  • fYear
    2013
  • fDate
    Feb. 2013
  • Firstpage
    388
  • Lastpage
    396
  • Abstract
    The recently proposed context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been proved highly promising for large vocabulary speech recognition. In this paper, we develop a more advanced type of DNN, which we call the deep tensor neural network (DTNN). The DTNN extends the conventional DNN by replacing one or more of its layers with a double-projection (DP) layer, in which each input vector is projected into two nonlinear subspaces, and a tensor layer, in which two subspace projections interact with each other and jointly predict the next layer in the deep architecture. In addition, we describe an approach to map the tensor layers to the conventional sigmoid layers so that the former can be treated and trained in a similar way to the latter. With this mapping we can consider a DTNN as the DNN augmented with DP layers so that not only the BP learning algorithm of DTNNs can be cleanly derived but also new types of DTNNs can be more easily developed. Evaluation on Switchboard tasks indicates that DTNNs can outperform the already high-performing DNNs with 4-5% and 3% relative word error reduction, respectively, using 30-hr and 309-hr training sets.
  • Keywords
    backpropagation; hidden Markov models; neural nets; speech recognition; tensors; BP learning algorithm; CD-DNN-HMM; DP layer; DTNN; context-dependent deep neural network hidden Markov models; deep architecture; deep tensor neural network; double-projection layer; large vocabulary speech recognition; nonlinear subspaces; relative word error reduction; sigmoid layers; subspace projections; switchboard tasks; time 30 hour; time 309 hr; Neural networks; Speech; Speech processing; Speech recognition; Tensile stress; Vectors; Vocabulary; Automatic speech recognition; CD-DNN-HMM; large vocabulary; tensor deep neural networks;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2227738
  • Filename
    6353550