Title :
Minimum segmentation error based discriminative training for speech synthesis application
Author :
Wu, Yi-Jian ; Kawai, Hisashi ; Ni, Jinfu ; Wang, Ren-Hua
Author_Institution :
Spoken Language Translation Labs., ATR, Kyoto, Japan
Abstract :
In the conventional HMM-based segmentation method, the HMM training is based on MLE criteria, which links the segmentation task to the problem of distribution estimation. The HMM are built to identify the phonetic segments, not to detect the boundary. This kind of inconsistency between training and application limited the performance of segmentation. In this paper, we adopt the discriminative training method and introduce a new criterion, named minimum segmentation error (MSGE), for HMM training. In this method, a loss function directly related to the segmentation error is defined. By minimizing the overall empirical loss with the generalized probabilistic descent (GPD) algorithm, the segmentation error is also minimized. From the results on both Chinese and Japanese data, the accuracy of segmentation is improved. Moreover, this method is robust even when we do not have enough knowledge on HMM modeling, e.g. the number of states is not optimized.
Keywords :
hidden Markov models; minimisation; speech synthesis; Chinese data; GPD algorithm; HMM training; Japanese data; MSGE; discriminative training; generalized probabilistic descent algorithm; loss function; minimum segmentation error; speech synthesis; Hidden Markov models; Iterative algorithms; Labeling; Laboratories; Maximum likelihood estimation; Natural languages; Optimization methods; Robustness; Speech recognition; Speech synthesis;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1326064