Title :
A new phase-based feature representation for robust speech recognition
Author :
Loweimi, Erfan ; Ahadi, Seyed Mohammad ; Drugman, Thomas
Author_Institution :
Electr. Eng. Dept., Amirkabir Univ. of Technol., Tehran, Iran
Abstract :
The aim of this paper is to introduce a novel phase-based feature representation for robust speech recognition. This method consists of four main parts: autoregressive (AR) model extraction, group delay function (GDF) computation, compression, and scale information augmentation. Coupling GDF with an AR model results in a high-resolution estimate of the power spectrum with low frequency leakage. The compression step includes two stages similar to MFCC without taking a logarithm of the output energies. The fourth part augments the phase-based feature vector with scale information which is based on the Hilbert transform relations and complements the phase spectrum information. In the presence of additive and convolutional noises, the proposed method has led to 15% and 12% reductions in the averaged error rates, respectively (SNR ranging from 0 to 20 dB), compared to the standard MFCCs.
Keywords :
Hilbert transforms; autoregressive processes; error statistics; feature extraction; image representation; speech recognition; AR model extraction; GDF computation; Hilbert transform relations; additive noises; autoregressive model extraction; averaged error rates; compression step; convolutional noises; feature vector; group delay function computation; high-resolution estimation; low frequency leakage; phase spectrum information; phase-based feature representation; power spectrum; robust speech recognition; scale information augmentation; standard MFCC; Abstracts; Mel frequency cepstral coefficient; Robustness; Speech; Speech recognition; Speech phase spectrum; compression; feature extraction; group delay; scale information;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639051