DocumentCode :
3594346
Title :
Speech segment selection for concatenative synthesis based on prosody-aligned distance measure
Author :
Kuo, Chih-Chung ; Kuo, Chi-Shiang
Author_Institution :
Computer & Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan
Volume :
1
fYear :
2002
Abstract :
This paper presents a new method for automatically selecting speech segments that are expected to minimize perceptual distortion in synthesis. The method is based on comparison of candidates fully prosody-aligned to each other. Automatic segmentation, pitch marking and PSOLA method work together for prosody alignment. Two distance measures, MFCC and PSQM, are used for comparison because of human perceptual consideration. Experiment shows that the average distortion by using the selected best unit in outside testing is similar to that in training corpus with only few exceptions. The symmetry characteristics and correlation of these two distance measures are also studied and reveal that both are properly symmetric and consistent with each other for most cases.
Keywords :
Distortion measurement; Frequency measurement; Geometry; Humans; Mel frequency cepstral coefficient; Speech; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7402-9
Type :
conf
DOI :
10.1109/ICASSP.2002.5743757
Filename :
5743757
Link To Document :
بازگشت