• DocumentCode
    178606
  • Title

    Refinements of regression-based context-dependent modelling of deep neural networks for automatic speech recognition

  • Author

    Guangsen Wang ; Khe Chai Sim

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    3022
  • Lastpage
    3026
  • Abstract
    The data sparsity problem of context-dependent (CD) acoustic modelling of deep neural networks (DNNs) in speech recognition is addressed by using the decision tree state clusters as the training targets. The CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In our previous work, a regression-based CD-DNN framework was proposed to address both the data sparsity and the clustering problems. This paper investigates several refinements for the regression-based CD-DNN including two more representative state approximation schemes and the incorporation of sequential learning. The two approximations are obtained based on the statistics learned from the training data. Sequential learning is applied to both broad phone DNN detectors and the regression NN. The proposed refinements are evaluated on a broadcast news transcription task. For the cross-entropy systems, the two approximations perform consistently better than our previous work. Consistent performance gain over the corresponding cross-entropy trained systems is also observed for both the baseline CD-DNN and the regression model with sequential learning.
  • Keywords
    decision trees; learning (artificial intelligence); neural nets; regression analysis; speech recognition; CD acoustic modelling; baseline CD-DNN; broad phone DNN detectors; broadcast news transcription task; clustering problem; context-dependent acoustic modelling; cross-entropy trained systems; data sparsity problem; decision tree state clusters; deep neural networks; regression-based CD-DNN framework; representative state approximation schemes; sequential learning; speech recognition; training targets; Approximation methods; Detectors; Hidden Markov models; Mathematical model; Neural networks; Speech recognition; Training; Articulatory Features; Canonical State Modelling; Context Dependent Modelling; Deep Neural Network; Logistic Regression; Sequential Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854155
  • Filename
    6854155