مرکز منطقه ای اطلاع رساني علوم و فناوري - A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin

DocumentCode :

1125755

Title :

A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS

Author :

Qian, Yao ; Liang, Hui ; Soong, Frank K.

Author_Institution :

Microsoft Res. Asia, Beijing, China

Volume :

Issue :

fYear :

2009

Firstpage :

1231

Lastpage :

1239

Abstract :

We propose a hidden Markov model (HMM)-based bilingual (Mandarin and English) text-to-speech (TTS) system to synthesize natural speech for given bilingual text. A simple baseline system consisting of two independent monolingual HMM synthesizers is built first from corresponding Mandarin and English data recorded by a bilingual speaker. A new, mixed language TTS is then constructed by asking language-independent and language-specific questions for sharing HMM states across the two languages in decision-tree based clustering. By sharing states, the new system has a smaller footprint than the baseline system. Speech synthesized by the new system sounds very similar to the baseline for non-mixed, Mandarin or English, monolingual sentences but much better for mixed-language sentences. This higher quality of mixed-language output is confirmed by a preference score, 60.2% to 39.8%, in a subjective listening test. A cross-language state mapping algorithm is further proposed for cross-language synthesis when only monolingual (English) recorded data from a source language speaker is available. Mandarin speech is then synthesized with the HMM model parameters in the nearest neighbor leaf nodes of the English decision tree. The nearest neighbor is measured with the Kullback-Leibler divergence (KLD) and mappings between leaf nodes in the decision trees of the source and target languages are established via the speech data recorded by a different, bilingual speaker. High voice (speaker) similarity is preserved in the synthesized target language sentences by using the recording of a source language from a monolingual speaker. Perceptual test results conducted on synthesized Mandarin speech show 1) high intelligibility which is confirmed by a Chinese character transcription accuracy of 92.1% and 2) decent speech quality with an average MOS score of 3.1.

Keywords :

decision trees; hidden Markov models; natural language processing; speech synthesis; English decision tree; Kullback-Leibler divergence; Mandarin speech; bilingual speaker; cross-language state sharing; decision-tree based clustering; hidden Markov model; independent monolingual HMM synthesizer; mapping approach; mixed-language sentences; speech quality; text-to-speech system; Asia; Decision trees; Engines; Hidden Markov models; Loudspeakers; Natural languages; Nearest neighbor searches; Speech processing; Speech synthesis; Testing; Bilingual; Kullback–Leibler divergence (KLD); hidden Markov model (HMM)-based speech synthesis; new language synthesis;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2015708

Filename :

5153557

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1125755