Intonational speaker verification: A study on parameters and performance under noisy conditions

Author

Siddiq, Sadjad ; Kinnunen, Tomi ; Vainio, Martti ; Werner, Stefan

Author_Institution

Univ. of Eastern Finland, Joensuu, Finland

fYear

2012

fDate

25-30 March 2012

Firstpage

4777

Lastpage

4780

Abstract

Prosody-based speaker verification using fundamental frequency (f₀) is considered. Our study consists of two phases. First, we do extensive optimization of parameters to establish a baseline system before dealing with noisy conditions. This includes a study of f₀ extractor parameters, choice of features (discrete cosine transform, discrete Fourier transform, Legendre polynomials, linear prediction), f₀ track interpolation (none, linear, Hermite), framing parameters and windowing (none, Hamming), f₀ representation domain (linear, log), number of transformation coefficients and, finally, use of higher-level delta coefficients. Using the optimized parameters, we then explore the robustness of prosody features under white noise and factory noise degradations. Using a GMM-UBM system on the NIST 2006 SRE corpus, we reach an EER of 28.4 % and 27.6 % for the intonational and MFCC features respectively at -20 dB SNR white noise contamination; fusion of the two yields an EER of 24.38 %.

Keywords

discrete Fourier transforms; discrete cosine transforms; polynomials; speaker recognition; Legendre polynomials; baseline system; delta coefficients; discrete Fourier transform; discrete cosine transform; extractor parameters; factory noise degradation; framing parameters; fundamental frequency; intonational speaker verification; linear prediction; noisy conditions; prosody based speaker verification; prosody features; representation domain; track interpolation; transformation coefficients; white noise contamination; Discrete Fourier transforms; Discrete cosine transforms; Feature extraction; Interpolation; Mel frequency cepstral coefficient; Speaker recognition; Speech; fundamental frequency; prosodic features; speaker recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6288987

Filename

6288987