Title :
Automatic speech recognition with an adaptation model motivated by auditory processing
Author :
Holmberg, Marcus ; Gelbart, David ; Hemmert, Werner
Author_Institution :
Infineon Technol. AG, Munich, Germany
Abstract :
The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.
Keywords :
Wiener filters; cepstral analysis; feature extraction; speech recognition; Wiener filtering; adaptation model; amplitude compression; auditory processing; automatic speech recognition; bark-warping; cepstral mean subtraction; frequency decomposition; mel-frequency cepstral coefficient; mel-warping; perceptual linear prediction feature extraction; physiological processing; synaptic adaptation; Adaptation model; Automatic speech recognition; Cepstral analysis; Collision mitigation; Feature extraction; Humans; Mel frequency cepstral coefficient; Psychoacoustic models; Speech recognition; Wiener filter; Neural adaptation; noise robustness; speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TSA.2005.860349