Title :
Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques
Author :
Kolossa, Dorothea ; Klimas, Aleksander ; Orglmeister, Reinhold
Author_Institution :
Electron. & Med. Signal Process., TU Berlin
Abstract :
Time-frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition, but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and denoising are often carried out in the short-time-Fourier-transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-cepstral coefficients and VQ-features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT-) domain, propagates these statistics through the transforms from the spectrum to e.g. the MFCC´s of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains can be achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation
Keywords :
Fourier transforms; convolution; source separation; speech intelligibility; speech recognition; statistics; LPC-cepstral coefficients; SNR; VQ-features; marginalization techniques; mel-frequency cepstral coefficients; missing data speech recognition; missing data techniques; noisy speech recognition; noisy-convolutive speech mixtures; short-time-Fourier-transform domain; source denoising; source separation; time-frequency masking; Noise reduction; Robustness; Signal processing; Source separation; Speech coding; Speech enhancement; Speech processing; Speech recognition; Statistics; Time frequency analysis;
Conference_Titel :
Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on
Conference_Location :
New Paltz, NY
Print_ISBN :
0-7803-9154-3
DOI :
10.1109/ASPAA.2005.1540174