DocumentCode
417116
Title
Using Haar transformed vocal source information for automatic speaker recognition
Author
Zheng, Nengheng ; Ching, P.C.
Author_Institution
Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Shatin, China
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
This paper attempts to investigate the effectiveness of incorporating vocal source information for enhancing automatic speaker recognition accuracy. We propose a new method to extract discriminative features from the linear prediction (LP) residual signal, which are closely related to the glottal excitation of individual speaker. A complementary parameter set in addition to the commonly used linear predictive cepstral coefficients (LPCC), called Haar octave coefficients of residue (HOCOR), is obtained by applying a Haar transform to the LP residue. This additional feature vector retains the spectro-temporal characteristics of the source excitation sequences that are related to the fundamental frequency, harmonics, as well as their phases. Experimental evaluation over the YOHO corpus demonstrates the high speaker discriminative power and high inter-speaker variability of HOCOR. Speaker recognition tests with both vocal tract feature (LPCC) and vocal source information (HOCOR) outperform the conventional methods of using LPCC only.
Keywords
Haar transforms; cepstral analysis; feature extraction; speaker recognition; time-frequency analysis; HOCOR; Haar octave coefficients of residue; Haar transformed vocal source information; LPCC; automatic speaker recognition; discriminative feature extraction; individual speaker glottal excitation; inter-speaker variability; linear prediction residual signal; linear predictive cepstral coefficients; residue time-frequency analysis; source excitation sequence spectro-temporal characteristics; speaker discriminative power; vocal tract features; Automatic speech recognition; Cepstral analysis; Data mining; Feature extraction; Fourier transforms; Mel frequency cepstral coefficient; Partial response channels; Speaker recognition; Testing; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1325926
Filename
1325926
Link To Document