Using predictive differential power spectrum and subband mel-spectrum centroid for robust speaker recognition in stationary noises

Author

Deng, Jing ; Zheng, Thomas Fang ; Song, Zhan-Jiang ; Liu, Jian ; Wu, Wen-Hu

Author_Institution

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

Volume

8

fYear

2005

fDate

18-21 Aug. 2005

Firstpage

4846

Abstract

In state-of-the-art speaker recognition systems, mel-scaled frequency cepstral coefficients (MFCCs) are perhaps the most widely used front-ends. One of the major issues with the MFCCs is that they are very sensitive to additive noises. In this paper, two methods for robust speech front-ends are proposed. One is to use a predictive difference function to calculate the differential power spectrums (DPS) as precisely as possible in order to restore the power spectrum of its original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated spectrum and the extracted features based on this are referred to as predictive differential power spectrum (PDPS) based cepstral coefficients (PDPSCCs). The other is to incorporate subband power information with subband mel-spectrum centroid information after the outputs of traditional mel-filter banks. The extracted features based on this are referred to as subband mel-spectrum centroid (SMSC) based cepstral coefficients (SMSCCCs). PDPSCCs and SMSCCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients are compared at different noise levels. Experimental results show that the PDPSCCs and SMSCCCs are more effective in enhancing the robustness of a speaker recognition system, where with the CMS method the average error rate can be reduced by 12.2% in comparison with DPS based cepstral coefficients.

Keywords

cepstral analysis; feature extraction; filtering theory; noise; speech recognition; feature extraction; mel-filter bank; mel-scaled frequency cepstral coefficient; predictive differential power spectrum; speaker recognition; stationary noises; subband mel-spectrum centroid; Additive noise; Cepstral analysis; Collision mitigation; Data mining; Feature extraction; Mel frequency cepstral coefficient; Noise level; Noise robustness; Speaker recognition; Speech; Robust; difference function; speaker recognition; subband;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on

Conference_Location

Guangzhou, China

Print_ISBN

0-7803-9091-1

Type

conf

DOI

10.1109/ICMLC.2005.1527796

Filename

1527796