Footprint reduction of Concatenative Text-To-Speech synthesizers using polynomial temporal decomposition

Author

Shoham, Tamar ; Malah, David ; Shechtman, Slava

Author_Institution

Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel

fYear

2010

fDate

3-5 March 2010

Firstpage

1

Lastpage

5

Abstract

High quality low footprint Concatenative Text-To-Speech (CTTS) synthesizers provide a persistent challenge in the field of speech processing. The spectral parameters representing the short speech segments used in the concatenation process constitute a large portion of the required memory. In this paper we propose to use a vectorial form of Polynomial Temporal Decomposition combined with jointly optimal segmentation and polynomial order selection in order to reduce the storage required for the spectral amplitude parameters by 50%, while preserving the perceptual quality of the obtained synthesized speech.

Keywords

polynomials; speech synthesis; footprint reduction; high quality low footprint concatenative text-to-speech synthesizers; polynomial order selection; polynomial temporal decomposition; spectral amplitude; spectral parameters; speech processing; Audio databases; Hidden Markov models; MPEG 7 Standard; Mel frequency cepstral coefficient; Polynomials; Spatial databases; Speech synthesis; Statistics; Synthesizers; Wavelet packets;

fLanguage

English

Publisher

ieee

Conference_Titel

Communications, Control and Signal Processing (ISCCSP), 2010 4th International Symposium on

Conference_Location

Limassol

Print_ISBN

978-1-4244-6285-8

Type

conf

DOI

10.1109/ISCCSP.2010.5463316

Filename

5463316