Title :
Speech segment selection for concatenative synthesis based on prosody-aligned distance measure
Author :
Kuo, Chih-Chung ; Kuo, Chi-Shiang
Author_Institution :
Computer & Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan
Abstract :
This paper presents a new method for automatically selecting speech segments that are expected to minimize perceptual distortion in synthesis. The method is based on comparison of candidates fully prosody-aligned to each other. Automatic segmentation, pitch marking and PSOLA method work together for prosody alignment. Two distance measures, MFCC and PSQM, are used for comparison because of human perceptual consideration. Experiment shows that the average distortion by using the selected best unit in outside testing is similar to that in training corpus with only few exceptions. The symmetry characteristics and correlation of these two distance measures are also studied and reveal that both are properly symmetric and consistent with each other for most cases.
Keywords :
Distortion measurement; Frequency measurement; Geometry; Humans; Mel frequency cepstral coefficient; Speech; Training;
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Print_ISBN :
0-7803-7402-9
DOI :
10.1109/ICASSP.2002.5743757