Title :
Modeling segmental duration in German text-to-speech synthesis
Author :
Möbius, Bernd ; Von Santen, J.
Author_Institution :
Dept. of Speech Synthesis Res., Bell Labs., Murray Hill, NJ, USA
Abstract :
The paper reports on the construction of a model for segmental duration in German. The model predicts the durations of speech sounds in various textual, prosodic, and segmental contexts. It has been implemented in the German version of the Bell Labs text to speech system (R. Sproat and J. Olive, 1995; B. Mobius et al., 1996). The construction of the duration system was made efficient by the use of an interactive statistical analysis package that incorporates the approach outlined by J.P.H. van Santen (1994). The results an stored in tables in a format that can be directly interpreted by the TTS duration module. Tables are constructed in two phases: inferential statistical analysis of the speech corpus, and parameter estimation. The overall correlation between observed and predicted segmental durations is .896
Keywords :
interactive systems; natural languages; parameter estimation; speech processing; speech synthesis; statistical analysis; Bell Labs text to speech system; German text to speech synthesis; TTS duration module; inferential statistical analysis; interactive statistical analysis package; parameter estimation; predicted segmental durations; segmental contexts; segmental duration modeling; speech corpus; speech sounds; Context modeling; Extrapolation; Frequency estimation; Natural languages; Predictive models; Spatial databases; Speech analysis; Speech synthesis; Statistical analysis; Training data;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607291