Title :
Using sparse representations for missing data imputation in noise robust speech recognition
Author :
Gemmeke, J.F. ; Cranen, B.
Author_Institution :
Dept. of Linguistics, Radboud Univ., Nijmegen, Netherlands
Abstract :
Noise robustness of automatic speech recognition benefits from using missing data imputation: Prior to recognition the parts of the spectrogram dominated by noise are replaced by clean speech estimates. Especially at low SNRs each frame contains at best only a few uncorrupted coefficients. This makes frame-by-frame restoration of corrupted feature vectors error-prone, and recognition accuracy will mostly be sub-optimal. In this paper we present a novel imputation technique working on entire words. A word is sparsely represented in an overcomplete basis of exemplar (clean) speech signals using only the uncorrupted time-frequency elements of the word. The corrupted elements are replaced by estimates obtained by projecting the sparse representation in the basis. We achieve recognition accuracies of 92% at SNR -5 dB using oracle masks on AURORA-2 as compared to 61% using a conventional frame-based approach. The performance obtained with estimated masks can be directly related to the proportion of correctly identified uncorrupted coefficients.
Keywords :
signal restoration; speech recognition; time-frequency analysis; AURORA-2; SNR; clean speech estimation; corrupted feature vector error-prone; exemplar speech signals; frame-by-frame restoration; missing data imputation; noise robust automatic speech recognition accuracy; oracle masks; sparse representations; spectrogram; uncorrupted time-frequency elements; Accuracy; Reliability; Signal to noise ratio; Speech; Speech recognition; Time-frequency analysis; Vectors;
Conference_Titel :
Signal Processing Conference, 2008 16th European
Conference_Location :
Lausanne