• DocumentCode
    178394
  • Title

    Analysis-by-synthesis feature estimation for robust automatic speech recognition using spectral masks

  • Author

    Mandel, Michael I. ; Narayanan, Arun

  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    2509
  • Lastpage
    2513
  • Abstract
    Spectral masking is a promising method for noise suppression in which regions of the spectrogram that are dominated by noise are attenuated while regions dominated by speech are preserved. It is not clear, however, how best to combine spectral masking with the non-linear processing necessary to compute automatic speech recognition features. We propose an analysis-by-synthesis approach to automatic speech recognition, which, given a spectral mask, poses the estimation of mel frequency cepstral coefficients (MFCCs) of the clean speech as an optimization problem. MFCCs are found that minimize a combination of the distance from the resynthesized clean power spectrum to the regions of the noisy spectrum selected by the mask and the negative log likelihood under an unmodified large vocabulary continuous speech recognizer. In evaluations on the Aurora4 noisy speech recognition task with both ideal and estimated masks, analysis-by-synthesis decreases both word error rates and distances to clean speech as compared to traditional approaches.
  • Keywords
    cepstral analysis; feature extraction; optimisation; speech recognition; vocabulary; Aurora4 noisy speech recognition task; MFCC; analysis-by-synthesis feature estimation; mel frequency cepstral coefficients; negative log likelihood; noise suppression; nonlinear processing; optimization problem; robust automatic speech recognition; spectral masking; spectrogram regions; unmodified large vocabulary; word error rates; Hidden Markov models; Lattices; Noise; Optimization; Speech; Speech processing; Speech recognition; analysis-by-synthesis; large vocabulary automatic speech recognition; missing data; time-frequency masking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854052
  • Filename
    6854052