DocumentCode
394241
Title
Context-adaptive phone boundary refining for a TTS database
Author
Lee, Ki-Seung ; Kim, Jeongsu
Author_Institution
Dept. of Electron. Eng, Konkuk Univ., Seoul, South Korea
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model (HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.
Keywords
adaptive signal processing; hidden Markov models; learning (artificial intelligence); multilayer perceptrons; speech synthesis; HMM; MLP training; RMS error; TTS database; automatic speech signals segmentation; boundary deviation; context-adaptive phone boundary refining; hand labelling positions; hidden Markov model; multilayer perceptron; optimum partitioning; phone boundaries estimation; phone boundary detector; phonetic transition; phonetic transition space; reference position; root mean square error; text-to-speech synthesis system; Automatic speech recognition; Databases; Detectors; Hidden Markov models; Labeling; Linear predictive coding; Multilayer perceptrons; Root mean square; Signal synthesis; Speech synthesis;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198765
Filename
1198765
Link To Document