Title :
Analysis on acoustic similarities between Tamil and English phonemes using product of likelihood-Gaussians for an HMM-based mixed-language synthesizer
Author :
Solomi, V. Sherlin ; Christina, S. Lilly ; Rachel, G. Anushiya ; Ramani, B. ; Vijayalakshmi, P. ; Nagarajan, T.
Author_Institution :
SSN Coll. of Eng., Chennai, India
Abstract :
A mixed-language (polyglot) synthesizer is one that synthesizes intelligible multilingual speech with a single speaker´s voice with appropriate pronunciations. Two main requirements of a mixed-language synthesizer are that (i) the transition from one language to another (language switching) and (ii) the influence of one language on another should not be perceivable. In this regard, in [1], while developing a bilingual text-to-speech (TTS) system for Mandarin and English, the minimum Kullback-Leibler divergence(KLD) criterion, applied state-wise to the context-independent hidden Markov models(HMMs) is used to cluster the states of acoustically similar phonemes across the two languages. In the current work, using context-independent HMMs trained separately for two languages, namely, Tamil and English, an attempt has been made to find the acoustically similar phonemes using product of Gaussians (PoG) in the log-likelihood space. A speech corpus, with Tamil and English data, uttered by the same speaker, is used for this task. The quality of the speech synthesized by the mixed-language synthesizer is assessed subjectively, and the mean opinion score of 3.49 is obtained when acoustically similar phonemes alone are merged. In addition, analyses are carried out to find the amount of language switching and the influence of one language on the other.
Keywords :
Gaussian processes; hidden Markov models; natural language processing; speech synthesis; English phoneme; HMM; HMM-based mixed-language synthesizer; Mandarin language; Tamil phoneme; acoustic similarities analysis; bilingual text-to-speech system; context-independent hidden Markov models; log-likelihood space; minimum Kullback-Leibler divergence criterion; multilingual speech; polyglot synthesizer; product of Gaussians; product of likelihood-Gaussians; speech corpus; Gold; Hidden Markov models; Speech; Speech recognition; Switches; Synthesizers; Mixed-language synthesizer; log-likelihood; product of Gaussians;
Conference_Titel :
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location :
Gurgaon
DOI :
10.1109/ICSDA.2013.6709898