DocumentCode :
900402
Title :
MLP-based phone boundary refining for a TTS database
Author :
Lee, Ki-Seung
Author_Institution :
Dept. of Electron. Eng., Konkuk Univ., Seoul, South Korea
Volume :
14
Issue :
3
fYear :
2006
fDate :
5/1/2006 12:00:00 AM
Firstpage :
981
Lastpage :
989
Abstract :
The automatic labeling of a large speech corpus plays an important role in the development of a high-quality Text-To-Speech (TTS) synthesis system. This paper describes a method for the automatic labeling of speech signals, which mainly involves the construction of a large database for a TTS synthesis system. The main objective of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model. A multilayer perceptron (MLP) was employed to refine the phone boundaries. To increase the accuracy of phoneme segmentation, several specialized MLPs were individually trained based on phonetic transition. The optimum partitioning of the entire phonetic transition space and the corresponding MLPs were constructed from the standpoint of minimizing the overall deviation from the hand-labeling position. The experimental results showed that more than 93% of all phone boundaries have a boundary deviation from a reference position smaller than 20 ms. We also confirmed that the database constructed using the proposed method produced results that were perceptually comparable to a hand-labeled database, based on subjective listening tests.
Keywords :
database management systems; hidden Markov models; multilayer perceptrons; speech synthesis; hidden Markov model; multilayer perceptron; phone boundary refining; speech corpus; speech signals automatic labeling; text-to-speech synthesis system database; Automatic speech recognition; Databases; Hidden Markov models; Labeling; Multilayer perceptrons; Signal processing; Signal synthesis; Speech synthesis; Testing; Viterbi algorithm; Automatic labeling; multilayer perceptron; phoneme boundary refinement; text-to-speech synthesis;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TSA.2005.858049
Filename :
1621210
Link To Document :
بازگشت