DocumentCode :
394241
Title :
Context-adaptive phone boundary refining for a TTS database
Author :
Lee, Ki-Seung ; Kim, Jeongsu
Author_Institution :
Dept. of Electron. Eng, Konkuk Univ., Seoul, South Korea
Volume :
1
fYear :
2003
fDate :
6-10 April 2003
Abstract :
A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model (HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.
Keywords :
adaptive signal processing; hidden Markov models; learning (artificial intelligence); multilayer perceptrons; speech synthesis; HMM; MLP training; RMS error; TTS database; automatic speech signals segmentation; boundary deviation; context-adaptive phone boundary refining; hand labelling positions; hidden Markov model; multilayer perceptron; optimum partitioning; phone boundaries estimation; phone boundary detector; phonetic transition; phonetic transition space; reference position; root mean square error; text-to-speech synthesis system; Automatic speech recognition; Databases; Detectors; Hidden Markov models; Labeling; Linear predictive coding; Multilayer perceptrons; Root mean square; Signal synthesis; Speech synthesis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1198765
Filename :
1198765
Link To Document :
بازگشت