• DocumentCode
    3131990
  • Title

    Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM

  • Author

    Jinyu Li ; Dong Yu ; Jui-Ting Huang ; Gong, Yu

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    131
  • Lastpage
    136
  • Abstract
    Context-dependent deep neural network hidden Markov model (CD-DNN-HMM) is a recently proposed acoustic model that significantly outperformed Gaussian mixture model (GMM)-HMM systems in many large vocabulary speech recognition (LVSR) tasks. In this paper we present our strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework. We show that DNNs provide the flexibility of using arbitrary features. By using the Mel-scale log-filter bank features we not only achieve higher recognition accuracy than using MFCCs, but also can formulate the mixed-bandwidth training problem as a missing feature problem, in which several feature dimensions have no value when narrowband speech is presented. This treatment makes training CD-DNN-HMMs with mixed-bandwidth data an easy task since no bandwidth extension is needed. Our experiments on voice search data indicate that the proposed solution not only provides higher recognition accuracy for the wideband speech but also allows the same CD-DNN-HMM to recognize mixed-bandwidth speech. By exploiting mixed-bandwidth training data CD-DNN-HMM outperforms fMPE+BMMI trained GMM-HMM, which cannot benefit from using narrowband data, by 18.4%.
  • Keywords
    channel bank filters; hidden Markov models; neural nets; speech recognition; vocabulary; CD-DNN-HMM; LVSR; acoustic model; context-dependent deep neural network hidden Markov model; feature dimensions; large vocabulary speech recognition; mel scale log filter bank features; mixed-bandwidth training data; narrowband speech; voice search data; wideband speech recognition; Filter banks; Narrowband; Speech; Speech recognition; Training; Training data; Wideband; CD-DNN-HMM; deep neural network; log filter bank; mixed-bandwidth; narrowband; wideband;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424210
  • Filename
    6424210