مرکز منطقه ای اطلاع رساني علوم و فناوري - An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions

DocumentCode :

1409319

Title :

An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions

Author :

Li, Qi ; Huang, Yan

Author_Institution :

Li Creative Technol., Inc., Florham Park, NJ, USA

Volume :

Issue :

fYear :

2011

Firstpage :

1791

Lastpage :

1801

Abstract :

An auditory-based feature extraction algorithm is presented. We name the new features as cochlear filter cepstral coefficients (CFCCs) which are defined based on a recently developed auditory transform (AT) plus a set of modules to emulate the signal processing functions in the cochlea. The CFCC features are applied to a speaker identification task to address the acoustic mismatch problem between training and testing environments. Usually, the performance of acoustic models trained in clean speech drops significantly when tested in noisy speech. The CFCC features have shown strong robustness in this kind of situation. In our experiments, the CFCC features consistently perform better than the baseline MFCC features under all three mismatched testing conditions-white noise, car noise, and babble noise. For example, in clean conditions, both MFCC and CFCC features perform similarly, over 96%, but when the signal-to-noise ratio (SNR) of the input signal is 6 dB, the accuracy of the MFCC features drops to 41.2%, while the CFCC features still achieve an accuracy of 88.3%. The proposed CFCC features also compare favorably to perceptual linear predictive (PLP) and RASTA-PLP features. The CFCC features consistently perform much better than PLP. Under white noise, the CFCC features are significantly better than RASTA-PLP, while under car and babble noise, the CFCC features provide similar performances to RASTA-PLP.

Keywords :

feature extraction; filtering theory; noise; speaker recognition; RASTA-PLP features; acoustic mismatch problem; acoustic models; auditory transform; auditory-based feature extraction; babble noise; car noise; cochlear filter cepstral coefficients; mismatched conditions; mismatched testing conditions; perceptual linear predictive features; robust speaker identification; signal processing functions; white noise; Auditory system; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Time frequency analysis; Transforms; Auditory-based features; automatic speaker recognition (ASR); cochlea; feature extraction algorithm; robust speaker recognition; speaker identification;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2010.2101594

Filename :

5672773

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1409319