DocumentCode
442172
Title
Using predictive differential power spectrum and subband mel-spectrum centroid for robust speaker recognition in stationary noises
Author
Deng, Jing ; Zheng, Thomas Fang ; Song, Zhan-Jiang ; Liu, Jian ; Wu, Wen-Hu
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume
8
fYear
2005
fDate
18-21 Aug. 2005
Firstpage
4846
Abstract
In state-of-the-art speaker recognition systems, mel-scaled frequency cepstral coefficients (MFCCs) are perhaps the most widely used front-ends. One of the major issues with the MFCCs is that they are very sensitive to additive noises. In this paper, two methods for robust speech front-ends are proposed. One is to use a predictive difference function to calculate the differential power spectrums (DPS) as precisely as possible in order to restore the power spectrum of its original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated spectrum and the extracted features based on this are referred to as predictive differential power spectrum (PDPS) based cepstral coefficients (PDPSCCs). The other is to incorporate subband power information with subband mel-spectrum centroid information after the outputs of traditional mel-filter banks. The extracted features based on this are referred to as subband mel-spectrum centroid (SMSC) based cepstral coefficients (SMSCCCs). PDPSCCs and SMSCCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients are compared at different noise levels. Experimental results show that the PDPSCCs and SMSCCCs are more effective in enhancing the robustness of a speaker recognition system, where with the CMS method the average error rate can be reduced by 12.2% in comparison with DPS based cepstral coefficients.
Keywords
cepstral analysis; feature extraction; filtering theory; noise; speech recognition; feature extraction; mel-filter bank; mel-scaled frequency cepstral coefficient; predictive differential power spectrum; speaker recognition; stationary noises; subband mel-spectrum centroid; Additive noise; Cepstral analysis; Collision mitigation; Data mining; Feature extraction; Mel frequency cepstral coefficient; Noise level; Noise robustness; Speaker recognition; Speech; Robust; difference function; speaker recognition; subband;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location
Guangzhou, China
Print_ISBN
0-7803-9091-1
Type
conf
DOI
10.1109/ICMLC.2005.1527796
Filename
1527796
Link To Document