مرکز منطقه ای اطلاع رساني علوم و فناوري - MLP-based phone boundary refining for a TTS database

DocumentCode :

900402

Title :

MLP-based phone boundary refining for a TTS database

Author :

Lee, Ki-Seung

Author_Institution :

Dept. of Electron. Eng., Konkuk Univ., Seoul, South Korea

Volume :

Issue :

fYear :

2006

fDate :

5/1/2006 12:00:00 AM

Firstpage :

981

Lastpage :

989

Abstract :

The automatic labeling of a large speech corpus plays an important role in the development of a high-quality Text-To-Speech (TTS) synthesis system. This paper describes a method for the automatic labeling of speech signals, which mainly involves the construction of a large database for a TTS synthesis system. The main objective of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model. A multilayer perceptron (MLP) was employed to refine the phone boundaries. To increase the accuracy of phoneme segmentation, several specialized MLPs were individually trained based on phonetic transition. The optimum partitioning of the entire phonetic transition space and the corresponding MLPs were constructed from the standpoint of minimizing the overall deviation from the hand-labeling position. The experimental results showed that more than 93% of all phone boundaries have a boundary deviation from a reference position smaller than 20 ms. We also confirmed that the database constructed using the proposed method produced results that were perceptually comparable to a hand-labeled database, based on subjective listening tests.

Keywords :

database management systems; hidden Markov models; multilayer perceptrons; speech synthesis; hidden Markov model; multilayer perceptron; phone boundary refining; speech corpus; speech signals automatic labeling; text-to-speech synthesis system database; Automatic speech recognition; Databases; Hidden Markov models; Labeling; Multilayer perceptrons; Signal processing; Signal synthesis; Speech synthesis; Testing; Viterbi algorithm; Automatic labeling; multilayer perceptron; phoneme boundary refinement; text-to-speech synthesis;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TSA.2005.858049

Filename :

1621210

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=900402