مرکز منطقه ای اطلاع رساني علوم و فناوري - Speech segment selection for concatenative synthesis based on prosody-aligned distance measure

DocumentCode :

3594346

Title :

Speech segment selection for concatenative synthesis based on prosody-aligned distance measure

Author :

Kuo, Chih-Chung ; Kuo, Chi-Shiang

Author_Institution :

Computer & Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan

Volume :

fYear :

2002

Abstract :

This paper presents a new method for automatically selecting speech segments that are expected to minimize perceptual distortion in synthesis. The method is based on comparison of candidates fully prosody-aligned to each other. Automatic segmentation, pitch marking and PSOLA method work together for prosody alignment. Two distance measures, MFCC and PSQM, are used for comparison because of human perceptual consideration. Experiment shows that the average distortion by using the selected best unit in outside testing is similar to that in training corpus with only few exceptions. The symmetry characteristics and correlation of these two distance measures are also studied and reveal that both are properly symmetric and consistent with each other for most cases.

Keywords :

Distortion measurement; Frequency measurement; Geometry; Humans; Mel frequency cepstral coefficient; Speech; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on

ISSN :

1520-6149

Print_ISBN :

0-7803-7402-9

Type :

conf

DOI :

10.1109/ICASSP.2002.5743757

Filename :

5743757

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3594346