• DocumentCode
    1409319
  • Title

    An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions

  • Author

    Li, Qi ; Huang, Yan

  • Author_Institution
    Li Creative Technol., Inc., Florham Park, NJ, USA
  • Volume
    19
  • Issue
    6
  • fYear
    2011
  • Firstpage
    1791
  • Lastpage
    1801
  • Abstract
    An auditory-based feature extraction algorithm is presented. We name the new features as cochlear filter cepstral coefficients (CFCCs) which are defined based on a recently developed auditory transform (AT) plus a set of modules to emulate the signal processing functions in the cochlea. The CFCC features are applied to a speaker identification task to address the acoustic mismatch problem between training and testing environments. Usually, the performance of acoustic models trained in clean speech drops significantly when tested in noisy speech. The CFCC features have shown strong robustness in this kind of situation. In our experiments, the CFCC features consistently perform better than the baseline MFCC features under all three mismatched testing conditions-white noise, car noise, and babble noise. For example, in clean conditions, both MFCC and CFCC features perform similarly, over 96%, but when the signal-to-noise ratio (SNR) of the input signal is 6 dB, the accuracy of the MFCC features drops to 41.2%, while the CFCC features still achieve an accuracy of 88.3%. The proposed CFCC features also compare favorably to perceptual linear predictive (PLP) and RASTA-PLP features. The CFCC features consistently perform much better than PLP. Under white noise, the CFCC features are significantly better than RASTA-PLP, while under car and babble noise, the CFCC features provide similar performances to RASTA-PLP.
  • Keywords
    feature extraction; filtering theory; noise; speaker recognition; RASTA-PLP features; acoustic mismatch problem; acoustic models; auditory transform; auditory-based feature extraction; babble noise; car noise; cochlear filter cepstral coefficients; mismatched conditions; mismatched testing conditions; perceptual linear predictive features; robust speaker identification; signal processing functions; white noise; Auditory system; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Time frequency analysis; Transforms; Auditory-based features; automatic speaker recognition (ASR); cochlea; feature extraction algorithm; robust speaker recognition; speaker identification;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2101594
  • Filename
    5672773