• DocumentCode
    730671
  • Title

    Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data

  • Author

    Yong Zhao ; Jinyu Li ; Jian Xue ; Yifan Gong

  • Author_Institution
    Microsoft Corp., Redmond, WA, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4310
  • Lastpage
    4314
  • Abstract
    To develop speaker adaptation algorithms for deep neural network (DNN) that are suitable for large-scale online deployment, it is desirable that the adaptation model be represented in a compact form and learned in an unsupervised fashion. In this paper, we propose a novel low-footprint adaptation technique for DNN that adapts the DNN model through node activation functions. The approach introduces slope and bias parameters in the sigmoid activation functions for each speaker, allowing the adaptation model to be stored in a small-sized storage space. We show that this adaptation technique can be formulated in a linear regression fashion, analogous to other speak adaptation algorithms that apply additional linear transformations to the DNN layers. We further investigate semi-supervised online adaptation by making use of the user click-through data as a supervision signal. The proposed method is evaluated on short message dictation and voice search tasks in supervised, unsupervised, and semi-supervised setups. Compared with the singular value decomposition (SVD) bottleneck adaptation, the proposed adaptation method achieves comparable accuracy improvements with much smaller footprint.
  • Keywords
    neural nets; regression analysis; speech recognition; click-through data; deep neural network; generalized linear regression; linear regression fashion; linear transformations; node activation functions; online low-footprint speaker adaptation; short message dictation; sigmoid activation functions; singular value decomposition; voice search tasks; Adaptation models; Hidden Markov models; Neural networks; Silicon; Speech; Speech recognition; Training; automatic speech recognition; deep neural network; low footprint; speaker adaptation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178784
  • Filename
    7178784