DocumentCode
1060063
Title
Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation
Author
Chan, Wai Nang ; Zheng, Nengheng ; Lee, Tan
Author_Institution
Chinese Univ. of Hong Kong, Hong Kong
Volume
15
Issue
6
fYear
2007
Firstpage
1884
Lastpage
1892
Abstract
This paper presents an analysis of the speaker discrimination power of vocal source related features, in comparison to the conventional vocal tract related features. The vocal source features, named wavelet octave coefficients of residues (WOCOR), are extracted by pitch-synchronous wavelet transform of the linear predictive (LP) residual signals. Using a series of controlled experiments, it is shown that WOCOR is less sensitive to spoken content than the conventional MFCC features and thus more discriminative when the amount of training data is limited. These advantages of WOCOR are exploited in the task of speaker segmentation for telephone conversation, in which statistical speaker models need to be built upon short speech segments. Experimental results show that the proposed use of WOCOR leads to noticeable reduction of segmentation errors.
Keywords
speech processing; statistical analysis; linear predictive residual signals; pitch-synchronous wavelet transform; segmentation errors reduction; speaker segmentation; statistical speaker; telephone conversation; training data; vocal source power discrimination; vocal tract related features; wavelet octave coefficients; Acoustic testing; Cepstral analysis; Data mining; Feature extraction; Loudspeakers; Mel frequency cepstral coefficient; Speaker recognition; Speech; Telephony; Training data; Speaker discrimination power; speaker segmentation; vocal source features; vocal tract features;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2007.900103
Filename
4276747
Link To Document