مرکز منطقه ای اطلاع رساني علوم و فناوري - Modeling segmental duration for Turkish text-to-speech

DocumentCode :

3616068

Title :

Modeling segmental duration for Turkish text-to-speech

Author :

O. Ozturk;T. Ciloglu

Author_Institution :

Dokuz Eylul Univ., Izmir, Turkey

fYear :

2004

fDate :

6/26/1905 12:00:00 AM

Firstpage :

272

Lastpage :

275

Abstract :

Text-to-speech (TTS) synthesis can be regarded as the automatic transformation of sentences from their text form into their speech waveform by machines. The most crucial problem confronting TTS systems is the generation of natural sounding voice. In order to obtain natural sounding synthetic speech, prosodic attributes of speech such as pitch frequency, duration and intensity should be modelled appropriately. This paper summarizes the efforts to obtain duration models to be utilized in Turkish TTS systems via machine-learning algorithms. In natural speech, segment durations are highly correlated to context. Similar/same phones differ from each other in their energy, duration and fundamental frequency depending on their context. To obtain natural speech thru TTS, prosodic variations due to context should be modeled. Different methods of modeling duration have been applied over the years. Two corpus-based statistical systems - linear regression and C4.5 decision tree - are employed in modeling segment durations in Turkish.

Keywords :

"Speech synthesis","Linear regression","Frequency","Natural languages","Context modeling","Decision trees","Stress"

Publisher :

ieee

Conference_Titel :

Signal Processing and Communications Applications Conference, 2004. Proceedings of the IEEE 12th

Print_ISBN :

0-7803-8318-4

Type :

conf

DOI :

10.1109/SIU.2004.1338312

Filename :

1338312

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3616068