Title :
Temporal Modulation Normalization for Robust Speech Feature Extraction and Recognition
Author :
Lu, Xugang ; Matsuda, Shigeki ; Unoki, Masashi ; Nakamura, Satoshi
Author_Institution :
Nat. Inst. of Inf. & Commun. Technol., Japan
Abstract :
Traditional noise reduction methods usually are based on the assumption that the short-term statistical distributions of speech and noise are different. Differently from that assumption, we have proposed a noise reduction method based on the assumption that the temporal modulations of noise and speech are different. Two steps are used in the proposed algorithm: one is the temporal modulation contrast normalization, another is the modulation events preserved smoothing. Since our proposed method can be used independently for noise reduction, it can be combined with the traditional noise reduction methods to further reduce the noise effect. We tested our proposed method as a front-end for robust speech recognition. Two advanced noise reduction methods, ETSI advanced front-end (AFE) method, and particle filtering (PF) with minimum mean square error (MMSE) estimation method, for comparison and combinations. Experimental results showed that our proposed method outperforms the advanced methods as an independent front-end processor, and further improved the performance consistently than using each method independently as combined front-ends.
Keywords :
feature extraction; least mean squares methods; particle filtering (numerical methods); signal denoising; smoothing methods; speech intelligibility; speech processing; speech recognition; statistical distributions; AFE; ETSI advanced front-end method; MMSE; independent front-end processor; minimum mean square error estimation; modulation event preserved smoothing; noise reduction; particle filtering; robust speech feature extraction; robust speech recognition; short-term statistical distribution; speech intelligibility; temporal modulation contrast normalization; Feature extraction; Filtering; Noise reduction; Noise robustness; Smoothing methods; Speech enhancement; Speech recognition; Statistical distributions; Telecommunication standards; Testing;
Conference_Titel :
Image and Signal Processing, 2009. CISP '09. 2nd International Congress on
Conference_Location :
Tianjin
Print_ISBN :
978-1-4244-4129-7
Electronic_ISBN :
978-1-4244-4131-0
DOI :
10.1109/CISP.2009.5303903