Title :
Localized spectro-temporal cepstral analysis of speech
Author :
Bouvrie, Jake ; Ezzat, Tony ; Poggio, Tomaso
Author_Institution :
Center for Biol. & Comput. Learning, Massachusetts Inst. of Technol., Cambridge, MA
fDate :
March 31 2008-April 4 2008
Abstract :
Drawing on recent progress in auditory neuroscience, we present a novel speech feature analysis technique based on localized spectro- temporal cepstral analysis of speech. We proceed by extracting localized 2D patches from the spectrogram and project onto a 2D discrete cosine (2D-DCT) basis. For each time frame, a speech feature vector is then formed by concatenating low-order 2D- DCT coefficients from the set of corresponding patches. We argue that our framework has significant advantages over standard one- dimensional MFCC features. In particular, we find that our features are more robust to noise, and better capture temporal modulations important for recognizing plosive sounds. We evaluate the performance of the proposed features on a TIMIT classification task in clean, pink, and babble noise conditions, and show that our feature analysis outperforms traditional features based on MFCCs.
Keywords :
discrete cosine transforms; pattern classification; speech processing; speech recognition; 1D MFCC features; auditory neuroscience; discrete cosine transform; localized 2D patches; pattern classification; spectrotemporal cepstral analysis; speech feature vector; speech processing; 1f noise; Acoustic noise; Cepstral analysis; Discrete cosine transforms; Mel frequency cepstral coefficient; Neuroscience; Noise robustness; Performance analysis; Spectrogram; Speech analysis; Cepstral analysis; Nervous system; Speech processing; Speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518714