DocumentCode :
310988
Title :
An auditory-based measure for improved phone segment concatenation
Author :
Chappell, David T. ; Hansen, John H L
Author_Institution :
Robust Speech Process. Lab., Duke Univ., Durham, NC, USA
Volume :
3
fYear :
1997
fDate :
21-24 Apr 1997
Firstpage :
1639
Abstract :
This paper describes a new auditory-based distance measure intended for use in a concatenated synthesis technique wherein the time- and frequency-domain characteristics are used to perform natural-sounding speaker synthesis. Whereas most concatenation systems use large databases (often +100,000 units), we begin from a small, limited database (approx. 400 units) and use a new spectral distortion measure to aid in the selection of phones for optimal concatenation. At the transition between speech segments, the new auditory-based distance metric assesses perceived discontinuities in the frequency domain. The distortion measure, which employs the Carney (see J. Acoust. Soc. Am., vol.93, p.401-17, 1993) auditory model, is used to select phones which minimize the perceived distortion between concatenated segments. Moreover, time- and frequency-domain methods can shape the prosodic and spectral characteristics of each speech segment. The final results demonstrate improved performance over standard concatenation methods applied to small databases
Keywords :
frequency-domain analysis; hearing; spectral analysis; speech intelligibility; speech processing; speech synthesis; time-domain analysis; auditory based distance measure; auditory model; concatenated synthesis; concatenation systems; distortion measure; frequency-domain characteristics; natural sounding speaker synthesis; optimal concatenation; perceived discontinuities; perceived distortion; performance; phone segment concatenation; prosodic characteristics; small databases; spectral characteristics; spectral distortion measure; speech segment; speech segments transition; time-domain characteristics; Concatenated codes; Databases; Electronic mail; Frequency domain analysis; Labeling; Laboratories; Robustness; Speech processing; Speech synthesis; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
ISSN :
1520-6149
Print_ISBN :
0-8186-7919-0
Type :
conf
DOI :
10.1109/ICASSP.1997.598820
Filename :
598820
Link To Document :
بازگشت