• DocumentCode
    442172
  • Title

    Using predictive differential power spectrum and subband mel-spectrum centroid for robust speaker recognition in stationary noises

  • Author

    Deng, Jing ; Zheng, Thomas Fang ; Song, Zhan-Jiang ; Liu, Jian ; Wu, Wen-Hu

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    8
  • fYear
    2005
  • fDate
    18-21 Aug. 2005
  • Firstpage
    4846
  • Abstract
    In state-of-the-art speaker recognition systems, mel-scaled frequency cepstral coefficients (MFCCs) are perhaps the most widely used front-ends. One of the major issues with the MFCCs is that they are very sensitive to additive noises. In this paper, two methods for robust speech front-ends are proposed. One is to use a predictive difference function to calculate the differential power spectrums (DPS) as precisely as possible in order to restore the power spectrum of its original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated spectrum and the extracted features based on this are referred to as predictive differential power spectrum (PDPS) based cepstral coefficients (PDPSCCs). The other is to incorporate subband power information with subband mel-spectrum centroid information after the outputs of traditional mel-filter banks. The extracted features based on this are referred to as subband mel-spectrum centroid (SMSC) based cepstral coefficients (SMSCCCs). PDPSCCs and SMSCCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients are compared at different noise levels. Experimental results show that the PDPSCCs and SMSCCCs are more effective in enhancing the robustness of a speaker recognition system, where with the CMS method the average error rate can be reduced by 12.2% in comparison with DPS based cepstral coefficients.
  • Keywords
    cepstral analysis; feature extraction; filtering theory; noise; speech recognition; feature extraction; mel-filter bank; mel-scaled frequency cepstral coefficient; predictive differential power spectrum; speaker recognition; stationary noises; subband mel-spectrum centroid; Additive noise; Cepstral analysis; Collision mitigation; Data mining; Feature extraction; Mel frequency cepstral coefficient; Noise level; Noise robustness; Speaker recognition; Speech; Robust; difference function; speaker recognition; subband;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
  • Conference_Location
    Guangzhou, China
  • Print_ISBN
    0-7803-9091-1
  • Type

    conf

  • DOI
    10.1109/ICMLC.2005.1527796
  • Filename
    1527796