Development and analysis of various phone-sized unit-based speech synthesizers

Author

Rachel, G. Anushiya ; Christina, S. Lilly ; Solomi, V. Sherlin ; Ramani, B. ; Vijayalakshmi, P. ; Nagarajan, T.

Author_Institution

SSN Coll. of Eng., Kalavakkam, India

fYear

2013

fDate

25-27 Nov. 2013

Firstpage

1

Lastpage

5

Abstract

A speech synthesizer, synthesizes speech in accordance with the text in any given language. Two important attributes of any synthesizer, are the quality of the synthesized speech and the footprint size of the voice. Quality of the synthesized speech primarily refers to naturalness and intelligibility. The synthetic speech sounds natural when there are no glitches and when the prosody is well-captured. The current work focuses on developing and analyzing HMM-based Tamil speech synthesizers using context-independent (monophone), and context-dependent (triphone and pentaphone) speech units. The mean opinion score is used to assess the quality of the synthetic speech produced by each of the systems. The dynamic characteristics of the source and system of the synthesized speech are also observed. The monophone-based synthesizer produces speech that is quite intelligible, however it lacks naturalness since the prosody is not well-captured. With the addition of contextual information, the quality of the synthesized speech is found to improve. The footprint size of the monophone-based system is around 264 kB, while that of the context-dependent speech unit-based synthesizers vary between 816 kB and 1720 kB. Depending upon the requirement one has to make a compromise between the quality and footprint size by selecting appropriate speech unit.

Keywords

hidden Markov models; natural language processing; speech synthesis; HMM-based Tamil speech synthesizer; context-dependent speech unit- based synthesizer; monophone-based synthesizer; pentaphone speech unit; phone-sized unit-based speech synthesizer; speech synthesized quality; synthetic speech; triphone speech unit; voice footprint size; Data models; Hidden Markov models; Memory management; Natural languages; Speech; Speech synthesis; Synthesizers; HMM-based; phone-sized units; speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference

Conference_Location

Gurgaon

Type

conf

DOI

10.1109/ICSDA.2013.6709897

Filename

6709897