Context-adaptive phone boundary refining for a TTS database

Author

Lee, Ki-Seung ; Kim, Jeongsu

Author_Institution

Dept. of Electron. Eng, Konkuk Univ., Seoul, South Korea

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model (HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

Keywords

adaptive signal processing; hidden Markov models; learning (artificial intelligence); multilayer perceptrons; speech synthesis; HMM; MLP training; RMS error; TTS database; automatic speech signals segmentation; boundary deviation; context-adaptive phone boundary refining; hand labelling positions; hidden Markov model; multilayer perceptron; optimum partitioning; phone boundaries estimation; phone boundary detector; phonetic transition; phonetic transition space; reference position; root mean square error; text-to-speech synthesis system; Automatic speech recognition; Databases; Detectors; Hidden Markov models; Labeling; Linear predictive coding; Multilayer perceptrons; Root mean square; Signal synthesis; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198765

Filename

1198765