• DocumentCode
    730706
  • Title

    Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR

  • Author

    Tjandra, Andros ; Sakti, Sakriani ; Neubig, Graham ; Toda, Tomoki ; Adriani, Mirna ; Nakamura, Satoshi

  • Author_Institution
    Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    4525
  • Lastpage
    4529
  • Abstract
    This paper explores the use of auditory features based on cochleograms; two dimensional speech features derived from gammatone filters within the convolutional neural network (CNN) framework. Furthermore, we also propose various possibilities to combine cochleogram features with log-mel filter banks or spectrogram features. In particular, we combine within low and high levels of CNN framework which we refer to as low-level and high-level feature combination. As comparison, we also construct the similar configuration with deep neural network (DNN). Performance was evaluated in the framework of hybrid neural network - hidden Markov model (NN-HMM) system on TIMIT phoneme sequence recognition task. The results reveal that cochleogram-spectrogram feature combination provides significant advantages. The best accuracy was obtained by high-level combination of two dimensional cochleogram-spectrogram features using CNN, achieved up to 8.2% relative phoneme error rate (PER) reduction from CNN single features or 19.7% relative PER reduction from DNN single features.
  • Keywords
    feature extraction; hidden Markov models; neural net architecture; speech recognition; 2D speech features; TIMIT phoneme sequence recognition task; auditory feature; convolutional neural network; deep learning based ASR; deep neural network; gammatone filters; hidden Markov model system; log-mel filter banks; spectrogram feature; two dimensional cochleogram feature; Acoustics; Convolution; Hidden Markov models; Neural networks; Spectrogram; Speech; Speech recognition; DNN and CNN; Deep learning; cochleogram; feature combination;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178827
  • Filename
    7178827