• DocumentCode
    721062
  • Title

    A Robust Acoustic Feature Extraction Approach Based on Stacked Denoising Autoencoder

  • Author

    Liu, J.H. ; Zheng, W.Q. ; Zou, Y.X.

  • Author_Institution
    Sch. of Electron. & Comput. Eng., Peking Univ., Shenzhen, China
  • fYear
    2015
  • fDate
    20-22 April 2015
  • Firstpage
    124
  • Lastpage
    127
  • Abstract
    Acoustic feature extraction (AFE) is considered as one of the most challenging techniques for speech applications since the adverse environment noises always cause significant variation on the extracted acoustic features. In this paper, we propose a systematical AFE approach which based on stacked denoising auto encoder (SDAE) aiming at extracting acoustic features automatically. Denoising auto encoder (DAE), which is trained to reconstruct a clean "repaired" input from a corrupted version of it, works as the basic building block to form SDAE. Besides, the training set with clean and noisy speech ensures the SDAE has much powerful ability to extract the robust features under different noise conditions. Considering the speaker classification task using features extracted by the proposed approach for evaluation, intensive experiments have been conducted on TIMIT and NIST SRE 2004 to show SDAE with 3 hidden layers (3L-SDAE) gives better performance than shallow layers. The results also show that the features extracted by 3L-SDAE performs better than MFCC features when SNR is lower than 6dB and act more robustly when SNR decreases. What\´s more, for different types of noises at SNR of 0dB, the accuracy of speaker classification using 3L-SDAE features is higher than about 84% while MFCC features is lower than 77%.
  • Keywords
    feature extraction; signal denoising; speaker recognition; speech coding; robust acoustic feature extraction; speaker classification task; stacked denoising autoencoder; Feature extraction; Mel frequency cepstral coefficient; Signal to noise ratio; Spectrogram; Speech; noisy environment; robust acoustic feature extraction; speaker classification; stacked denoising autoencoder;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Big Data (BigMM), 2015 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-8687-3
  • Type

    conf

  • DOI
    10.1109/BigMM.2015.46
  • Filename
    7153865