DocumentCode :
1409319
Title :
An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions
Author :
Li, Qi ; Huang, Yan
Author_Institution :
Li Creative Technol., Inc., Florham Park, NJ, USA
Volume :
19
Issue :
6
fYear :
2011
Firstpage :
1791
Lastpage :
1801
Abstract :
An auditory-based feature extraction algorithm is presented. We name the new features as cochlear filter cepstral coefficients (CFCCs) which are defined based on a recently developed auditory transform (AT) plus a set of modules to emulate the signal processing functions in the cochlea. The CFCC features are applied to a speaker identification task to address the acoustic mismatch problem between training and testing environments. Usually, the performance of acoustic models trained in clean speech drops significantly when tested in noisy speech. The CFCC features have shown strong robustness in this kind of situation. In our experiments, the CFCC features consistently perform better than the baseline MFCC features under all three mismatched testing conditions-white noise, car noise, and babble noise. For example, in clean conditions, both MFCC and CFCC features perform similarly, over 96%, but when the signal-to-noise ratio (SNR) of the input signal is 6 dB, the accuracy of the MFCC features drops to 41.2%, while the CFCC features still achieve an accuracy of 88.3%. The proposed CFCC features also compare favorably to perceptual linear predictive (PLP) and RASTA-PLP features. The CFCC features consistently perform much better than PLP. Under white noise, the CFCC features are significantly better than RASTA-PLP, while under car and babble noise, the CFCC features provide similar performances to RASTA-PLP.
Keywords :
feature extraction; filtering theory; noise; speaker recognition; RASTA-PLP features; acoustic mismatch problem; acoustic models; auditory transform; auditory-based feature extraction; babble noise; car noise; cochlear filter cepstral coefficients; mismatched conditions; mismatched testing conditions; perceptual linear predictive features; robust speaker identification; signal processing functions; white noise; Auditory system; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Time frequency analysis; Transforms; Auditory-based features; automatic speaker recognition (ASR); cochlea; feature extraction algorithm; robust speaker recognition; speaker identification;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2010.2101594
Filename :
5672773
Link To Document :
بازگشت