DocumentCode
900402
Title
MLP-based phone boundary refining for a TTS database
Author
Lee, Ki-Seung
Author_Institution
Dept. of Electron. Eng., Konkuk Univ., Seoul, South Korea
Volume
14
Issue
3
fYear
2006
fDate
5/1/2006 12:00:00 AM
Firstpage
981
Lastpage
989
Abstract
The automatic labeling of a large speech corpus plays an important role in the development of a high-quality Text-To-Speech (TTS) synthesis system. This paper describes a method for the automatic labeling of speech signals, which mainly involves the construction of a large database for a TTS synthesis system. The main objective of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model. A multilayer perceptron (MLP) was employed to refine the phone boundaries. To increase the accuracy of phoneme segmentation, several specialized MLPs were individually trained based on phonetic transition. The optimum partitioning of the entire phonetic transition space and the corresponding MLPs were constructed from the standpoint of minimizing the overall deviation from the hand-labeling position. The experimental results showed that more than 93% of all phone boundaries have a boundary deviation from a reference position smaller than 20 ms. We also confirmed that the database constructed using the proposed method produced results that were perceptually comparable to a hand-labeled database, based on subjective listening tests.
Keywords
database management systems; hidden Markov models; multilayer perceptrons; speech synthesis; hidden Markov model; multilayer perceptron; phone boundary refining; speech corpus; speech signals automatic labeling; text-to-speech synthesis system database; Automatic speech recognition; Databases; Hidden Markov models; Labeling; Multilayer perceptrons; Signal processing; Signal synthesis; Speech synthesis; Testing; Viterbi algorithm; Automatic labeling; multilayer perceptron; phoneme boundary refinement; text-to-speech synthesis;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TSA.2005.858049
Filename
1621210
Link To Document