DocumentCode :
442172
Title :
Using predictive differential power spectrum and subband mel-spectrum centroid for robust speaker recognition in stationary noises
Author :
Deng, Jing ; Zheng, Thomas Fang ; Song, Zhan-Jiang ; Liu, Jian ; Wu, Wen-Hu
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume :
8
fYear :
2005
fDate :
18-21 Aug. 2005
Firstpage :
4846
Abstract :
In state-of-the-art speaker recognition systems, mel-scaled frequency cepstral coefficients (MFCCs) are perhaps the most widely used front-ends. One of the major issues with the MFCCs is that they are very sensitive to additive noises. In this paper, two methods for robust speech front-ends are proposed. One is to use a predictive difference function to calculate the differential power spectrums (DPS) as precisely as possible in order to restore the power spectrum of its original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated spectrum and the extracted features based on this are referred to as predictive differential power spectrum (PDPS) based cepstral coefficients (PDPSCCs). The other is to incorporate subband power information with subband mel-spectrum centroid information after the outputs of traditional mel-filter banks. The extracted features based on this are referred to as subband mel-spectrum centroid (SMSC) based cepstral coefficients (SMSCCCs). PDPSCCs and SMSCCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients are compared at different noise levels. Experimental results show that the PDPSCCs and SMSCCCs are more effective in enhancing the robustness of a speaker recognition system, where with the CMS method the average error rate can be reduced by 12.2% in comparison with DPS based cepstral coefficients.
Keywords :
cepstral analysis; feature extraction; filtering theory; noise; speech recognition; feature extraction; mel-filter bank; mel-scaled frequency cepstral coefficient; predictive differential power spectrum; speaker recognition; stationary noises; subband mel-spectrum centroid; Additive noise; Cepstral analysis; Collision mitigation; Data mining; Feature extraction; Mel frequency cepstral coefficient; Noise level; Noise robustness; Speaker recognition; Speech; Robust; difference function; speaker recognition; subband;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
Type :
conf
DOI :
10.1109/ICMLC.2005.1527796
Filename :
1527796
Link To Document :
بازگشت