DocumentCode :
3165465
Title :
Intonational speaker verification: A study on parameters and performance under noisy conditions
Author :
Siddiq, Sadjad ; Kinnunen, Tomi ; Vainio, Martti ; Werner, Stefan
Author_Institution :
Univ. of Eastern Finland, Joensuu, Finland
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4777
Lastpage :
4780
Abstract :
Prosody-based speaker verification using fundamental frequency (f0) is considered. Our study consists of two phases. First, we do extensive optimization of parameters to establish a baseline system before dealing with noisy conditions. This includes a study of f0 extractor parameters, choice of features (discrete cosine transform, discrete Fourier transform, Legendre polynomials, linear prediction), f0 track interpolation (none, linear, Hermite), framing parameters and windowing (none, Hamming), f0 representation domain (linear, log), number of transformation coefficients and, finally, use of higher-level delta coefficients. Using the optimized parameters, we then explore the robustness of prosody features under white noise and factory noise degradations. Using a GMM-UBM system on the NIST 2006 SRE corpus, we reach an EER of 28.4 % and 27.6 % for the intonational and MFCC features respectively at -20 dB SNR white noise contamination; fusion of the two yields an EER of 24.38 %.
Keywords :
discrete Fourier transforms; discrete cosine transforms; polynomials; speaker recognition; Legendre polynomials; baseline system; delta coefficients; discrete Fourier transform; discrete cosine transform; extractor parameters; factory noise degradation; framing parameters; fundamental frequency; intonational speaker verification; linear prediction; noisy conditions; prosody based speaker verification; prosody features; representation domain; track interpolation; transformation coefficients; white noise contamination; Discrete Fourier transforms; Discrete cosine transforms; Feature extraction; Interpolation; Mel frequency cepstral coefficient; Speaker recognition; Speech; fundamental frequency; prosodic features; speaker recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288987
Filename :
6288987
Link To Document :
بازگشت