Title :
Optimizing speech synthesizer memory footprint through phoneme set reduction
Author :
Moberg, Marko ; Viikki, Olli
Author_Institution :
Speech & Audio Syst. Lab., Nokia Res. Center, Tampere, Finland
Abstract :
The embedded device market is currently searching for low memory footprint solutions to enable the use of speech technology, including speech synthesis, in mass products. The amount of memory consumed has a direct impact on the product manufacturing costs therefore every means to save memory should be exploited. In speech synthesis, some memory saving can be achieved by reducing the number of phonemes in a given language. According to the listening evaluation test, certain affricates, diphthongs and long vowels in USA-English can be expressed as a combination of two other phonemes. The improved or equal intelligibility and quality were achieved by adding one new phoneme to the phoneme set and by simultaneously removing four of the original phonemes, /tS/, /e/, /O/ and /OI/. The net decrease in the number of phonemes reduced the memory required to store Klatt88 synthesis parameters by 7% and the memory needed for speech database in diphone concatenation synthesis by approximately 10%. More substantial saving in the memory size can be achieved if small degradation of quality and intelligibility is accepted.
Keywords :
embedded systems; speech intelligibility; speech processing; speech synthesis; Klatt88 synthesis parameters; USA-English; affricates; diphone concatenation synthesis; diphthongs; embedded device; intelligibility; long vowels; memory footprint; memory saving; phoneme combination; phoneme set reduction; speech quality; speech synthesis; Audio systems; Costs; Databases; Handheld computers; Laboratories; Manufacturing; Natural languages; Speech synthesis; Synthesizers; Testing;
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
DOI :
10.1109/WSS.2002.1224401