Title :
Using predictive differential power spectrum and subband mel-spectrum centroid for robust speaker recognition in stationary noises
Author :
Deng, Jing ; Zheng, Thomas Fang ; Song, Zhan-Jiang ; Liu, Jian ; Wu, Wen-Hu
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
In state-of-the-art speaker recognition systems, mel-scaled frequency cepstral coefficients (MFCCs) are perhaps the most widely used front-ends. One of the major issues with the MFCCs is that they are very sensitive to additive noises. In this paper, two methods for robust speech front-ends are proposed. One is to use a predictive difference function to calculate the differential power spectrums (DPS) as precisely as possible in order to restore the power spectrum of its original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated spectrum and the extracted features based on this are referred to as predictive differential power spectrum (PDPS) based cepstral coefficients (PDPSCCs). The other is to incorporate subband power information with subband mel-spectrum centroid information after the outputs of traditional mel-filter banks. The extracted features based on this are referred to as subband mel-spectrum centroid (SMSC) based cepstral coefficients (SMSCCCs). PDPSCCs and SMSCCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients are compared at different noise levels. Experimental results show that the PDPSCCs and SMSCCCs are more effective in enhancing the robustness of a speaker recognition system, where with the CMS method the average error rate can be reduced by 12.2% in comparison with DPS based cepstral coefficients.
Keywords :
cepstral analysis; feature extraction; filtering theory; noise; speech recognition; feature extraction; mel-filter bank; mel-scaled frequency cepstral coefficient; predictive differential power spectrum; speaker recognition; stationary noises; subband mel-spectrum centroid; Additive noise; Cepstral analysis; Collision mitigation; Data mining; Feature extraction; Mel frequency cepstral coefficient; Noise level; Noise robustness; Speaker recognition; Speech; Robust; difference function; speaker recognition; subband;
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
DOI :
10.1109/ICMLC.2005.1527796