DocumentCode
310574
Title
Speaker normalization based on frequency warping
Author
Zhan, Puming ; Westphal, Martin
Author_Institution
Interactive Syst. Labs., Carnegie Mellon Univ., Pittsburgh, PA, USA
Volume
2
fYear
1997
fDate
21-24 Apr 1997
Firstpage
1039
Abstract
In speech recognition, speaker-dependence of a speech recognition system comes from speaker-dependence of the speech feature, and the variation of vocal tract shape is the major source of inter-speaker variations of the speech feature, though there are some other sources which also contribute. In this paper, we address the approach of speaker normalization which aims at normalizing speaker´s vocal tract length based on frequency warping (FWP). The FWP is implemented in the front-end preprocessing of our speech recognition system. We investigate the formant-based and ML-based FWP in linear and nonlinear warping modes, and compare them in detail. All experimental results are based on our JANUS3 large vocabulary continuous speech recognition system and the Spanish Spontaneous Scheduling Task database (SSST)
Keywords
feature extraction; maximum likelihood estimation; speech processing; speech recognition; JANUS3 large vocabulary continuous speech recognition system; Spanish Spontaneous Scheduling Task database; formant-based frequency warping; frequency warping; front-end preprocessing; inter-speaker variations; linear warping modes; maximum-likelihood-based frequency warping; nonlinear warping modes; speaker normalization; speech feature; speech recognition; vocal tract length; vocal tract shape; Context modeling; Databases; Frequency; Interactive systems; Laboratories; Nonlinear distortion; Shape; Speech enhancement; Speech recognition; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location
Munich
ISSN
1520-6149
Print_ISBN
0-8186-7919-0
Type
conf
DOI
10.1109/ICASSP.1997.596118
Filename
596118
Link To Document