Title :
Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages
Author :
Ramani, B. ; Actlin Jeeva, M.P. ; Vijayalakshmi, P. ; Nagarajan, T.
Author_Institution :
SSN Coll. of Eng., Chennai, India
Abstract :
A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.
Keywords :
Gaussian processes; hidden Markov models; mixture models; natural languages; speech synthesis; ABX listening test; GMM-based cross-lingual voice conversion technique; Gaussian mixture model-based cross-lingual voice conversion technique; HMM-based synthesis technique; Hindi; Indian languages; Malayalam; Telugu; hidden Markov model based synthesis technique; human listener; language switching points; mixed language text; multilingual TTS system; multilingual speech corpus; multilingual text-to-speech system; native Tamil speaker; polyglot speech corpus; polyglot speech synthesizer; speaker switching; speech signal synthesis; voice conversion-based multilingual speech synthesizer; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Switches; Synthesizers;
Conference_Titel :
TENCON 2013 - 2013 IEEE Region 10 Conference (31194)
Conference_Location :
Xi´an
Print_ISBN :
978-1-4799-2825-5
DOI :
10.1109/TENCON.2013.6719019