Using Haar transformed vocal source information for automatic speaker recognition

Author

Zheng, Nengheng ; Ching, P.C.

Author_Institution

Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Shatin, China

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

This paper attempts to investigate the effectiveness of incorporating vocal source information for enhancing automatic speaker recognition accuracy. We propose a new method to extract discriminative features from the linear prediction (LP) residual signal, which are closely related to the glottal excitation of individual speaker. A complementary parameter set in addition to the commonly used linear predictive cepstral coefficients (LPCC), called Haar octave coefficients of residue (HOCOR), is obtained by applying a Haar transform to the LP residue. This additional feature vector retains the spectro-temporal characteristics of the source excitation sequences that are related to the fundamental frequency, harmonics, as well as their phases. Experimental evaluation over the YOHO corpus demonstrates the high speaker discriminative power and high inter-speaker variability of HOCOR. Speaker recognition tests with both vocal tract feature (LPCC) and vocal source information (HOCOR) outperform the conventional methods of using LPCC only.

Keywords

Haar transforms; cepstral analysis; feature extraction; speaker recognition; time-frequency analysis; HOCOR; Haar octave coefficients of residue; Haar transformed vocal source information; LPCC; automatic speaker recognition; discriminative feature extraction; individual speaker glottal excitation; inter-speaker variability; linear prediction residual signal; linear predictive cepstral coefficients; residue time-frequency analysis; source excitation sequence spectro-temporal characteristics; speaker discriminative power; vocal tract features; Automatic speech recognition; Cepstral analysis; Data mining; Feature extraction; Fourier transforms; Mel frequency cepstral coefficient; Partial response channels; Speaker recognition; Testing; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1325926

Filename

1325926