• DocumentCode
    394241
  • Title

    Context-adaptive phone boundary refining for a TTS database

  • Author

    Lee, Ki-Seung ; Kim, Jeongsu

  • Author_Institution
    Dept. of Electron. Eng, Konkuk Univ., Seoul, South Korea
  • Volume
    1
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Markov Model (HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.
  • Keywords
    adaptive signal processing; hidden Markov models; learning (artificial intelligence); multilayer perceptrons; speech synthesis; HMM; MLP training; RMS error; TTS database; automatic speech signals segmentation; boundary deviation; context-adaptive phone boundary refining; hand labelling positions; hidden Markov model; multilayer perceptron; optimum partitioning; phone boundaries estimation; phone boundary detector; phonetic transition; phonetic transition space; reference position; root mean square error; text-to-speech synthesis system; Automatic speech recognition; Databases; Detectors; Hidden Markov models; Labeling; Linear predictive coding; Multilayer perceptrons; Root mean square; Signal synthesis; Speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1198765
  • Filename
    1198765