مرکز منطقه ای اطلاع رساني علوم و فناوري - Building speech synthesis systems for Indian languages

DocumentCode :

2428392

Title :

Building speech synthesis systems for Indian languages

Author :

Pradhan, Abhijit ; Prakash, Anusha ; Aswin Shanmugam, S. ; Kasthuri, G.R. ; Krishnan, Raghava ; Murthy, Hema A.

Author_Institution :

Dept. of Comput. Sci. & Eng., IIT Madras, Chennai, India

fYear :

2015

fDate :

Feb. 27 2015-March 1 2015

Firstpage :

Lastpage :

Abstract :

In this paper, new efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented. The synthesisers are built around both concatenative speech synthesis and statistical parametric speech synthesis frameworks. Text to speech synthesis systems require accurate segmentation. Obtaining accurate segmentation at the phone-level is a difficult task. Manual segmentation leads to human errors, while automatic segmentation using statistical approaches (hidden Markov model based approaches) leads to poor boundary information, when the amount of data used for training is small. A group delay based syllable segmentation semi-automatic tool is discussed. The tool is semi-automatic as some of the boundaries obtained are inaccurate and have to be manually corrected. Next, a segmentation algorithm that uses both HMM based segmentation and group delay based segmentation, is used to obtain accurate boundaries automatically. The boundaries obtained are used in the syllable-based synthesiser for unit selection. In the statistical phone-based synthesiser, embedded reestimation is performed at the phone level. Syllable-based and penta-phone based HMMs are used for building the synthesiser. TTS systems for 12 different Indian languages namely Tamil, Hindi, Marathi, Malayalam, Telugu, Rajasthani, Bengali, Odia, Assamese, Manipuri, Kannada and Gujarati are built using semi-automatic segmentation and synthesisers have been built for 7 Indian languages using automatic segmentation. Evaluation of the semi-automatic segmentation systems indicate that the MOS (mean opinion score) is above 3.0 for most of the languages. Pair comparison tests on semi-automatic vs. automatic segmentation show that automatic segmentation is preferred.

Keywords :

hidden Markov models; natural language processing; speech synthesis; HMM based segmentation; Indian languages; TTS; automatic segmentation; concatenative speech synthesis; group delay based syllable segmentation semi-automatic tool; hidden Markov model based approaches; penta-phone based HMM; statistical approaches; statistical parametric speech synthesis frameworks; statistical phone-based synthesiser; syllable-based synthesiser; text-to-speech synthesis systems; Hidden Markov models; Indian languages; segmentation; statistical parametric synthesis; syllable-based speech synthesis; text-to-speech synthesis; unit selection synthesis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications (NCC), 2015 Twenty First National Conference on

Conference_Location :

Mumbai

Type :

conf

DOI :

10.1109/NCC.2015.7084931

Filename :

7084931

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2428392