Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation

Author

Chan, Wai Nang ; Zheng, Nengheng ; Lee, Tan

Author_Institution

Chinese Univ. of Hong Kong, Hong Kong

Volume

15

Issue

6

fYear

2007

Firstpage

1884

Lastpage

1892

Abstract

This paper presents an analysis of the speaker discrimination power of vocal source related features, in comparison to the conventional vocal tract related features. The vocal source features, named wavelet octave coefficients of residues (WOCOR), are extracted by pitch-synchronous wavelet transform of the linear predictive (LP) residual signals. Using a series of controlled experiments, it is shown that WOCOR is less sensitive to spoken content than the conventional MFCC features and thus more discriminative when the amount of training data is limited. These advantages of WOCOR are exploited in the task of speaker segmentation for telephone conversation, in which statistical speaker models need to be built upon short speech segments. Experimental results show that the proposed use of WOCOR leads to noticeable reduction of segmentation errors.

Keywords

speech processing; statistical analysis; linear predictive residual signals; pitch-synchronous wavelet transform; segmentation errors reduction; speaker segmentation; statistical speaker; telephone conversation; training data; vocal source power discrimination; vocal tract related features; wavelet octave coefficients; Acoustic testing; Cepstral analysis; Data mining; Feature extraction; Loudspeakers; Mel frequency cepstral coefficient; Speaker recognition; Speech; Telephony; Training data; Speaker discrimination power; speaker segmentation; vocal source features; vocal tract features;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2007.900103

Filename

4276747