مرکز منطقه ای اطلاع رساني علوم و فناوري - Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages

DocumentCode :

2986669

Title :

Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages

Author :

Ramani, B. ; Actlin Jeeva, M.P. ; Vijayalakshmi, P. ; Nagarajan, T.

Author_Institution :

SSN Coll. of Eng., Chennai, India

fYear :

2013

fDate :

22-25 Oct. 2013

Firstpage :

Lastpage :

Abstract :

A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.

Keywords :

Gaussian processes; hidden Markov models; mixture models; natural languages; speech synthesis; ABX listening test; GMM-based cross-lingual voice conversion technique; Gaussian mixture model-based cross-lingual voice conversion technique; HMM-based synthesis technique; Hindi; Indian languages; Malayalam; Telugu; hidden Markov model based synthesis technique; human listener; language switching points; mixed language text; multilingual TTS system; multilingual speech corpus; multilingual text-to-speech system; native Tamil speaker; polyglot speech corpus; polyglot speech synthesizer; speaker switching; speech signal synthesis; voice conversion-based multilingual speech synthesizer; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Switches; Synthesizers;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

TENCON 2013 - 2013 IEEE Region 10 Conference (31194)

Conference_Location :

Xi´an

ISSN :

2159-3442

Print_ISBN :

978-1-4799-2825-5

Type :

conf

DOI :

10.1109/TENCON.2013.6719019

Filename :

6719019

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2986669