• DocumentCode
    730669
  • Title

    An investigation into speaker informed DNN front-end for LVCSR

  • Author

    Yulan Liu ; Karanasou, Penny ; Hain, Thomas

  • Author_Institution
    Speech & Hearing Res. Group, Univ. of Sheffield, Sheffield, UK
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4300
  • Lastpage
    4304
  • Abstract
    Deep Neural Network (DNN) has become a standard method in many ASR tasks. Recently there is considerable interest in “informed training” of DNNs, where DNN input is augmented with auxiliary codes, such as i-vectors, speaker codes, speaker separation bottleneck (SSBN) features, etc. This paper compares different speaker informed DNN training methods in LVCSR task. We discuss mathematical equivalence between speaker informed DNN training and “bias adaptation” which uses speaker dependent biases, and give detailed analysis on influential factors such as dimension, discrimination and stability of auxiliary codes. The analysis is supported by experiments on a meeting recognition task using bottleneck feature based system. Results show that i-vector based adaptation is also effective in bottleneck feature based system (not just hybrid systems). However all tested methods show poor generalisation to unseen speakers. We introduce a system based on speaker classification followed by speaker adaptation of biases, which yields equivalent performance to an i-vector based system with 10.4% relative improvement over baseline on seen speakers. The new approach can serve as a fast alternative especially for short utterances.
  • Keywords
    neural nets; speech recognition; ASR tasks; LVCSR; auxiliary codes discrimination; auxiliary codes stability; deep neural network; informed training; meeting recognition task; speaker classification; speaker informed DNN front end; speaker separation bottleneck; Acoustics; Hidden Markov models; Neural networks; Speech; Speech processing; Speech recognition; Training; bias adaptation; deep neural network; speaker adaptation; speaker informed training; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178782
  • Filename
    7178782