Title :
Improved generation of speech from its abstract and structural representation
Author :
Minematsu, Nobuaki ; Saito, Daisuke ; Hirose, Keikichi
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
Abstract :
This paper describes an improved method for the framework of structure-to-speech conversion we proposed previously. This framework aims at building a speaking machine by simulating infants´ language acquisition. Most of the speech synthesizers take a phoneme sequence as input and convert it to speech sounds, i.e. reading machines. Infants initially acquire speech communication capacity without phonemes or reading. Since their phonemic awareness is very immature, young children can hardly decompose an utterance into a sequence of phonemes but they enjoy speech communication with their parents. As developmental psychology claims, infants acquire the holistic sound patterns that underlie individual utterances, called word Gestalt. Infants reproduce this sound pattern using their very short vocal tubes, i.e. vocal imitation. In our previous studies, the word Gestalt was defined mathematically, called speech structure, and a method of extracting it from a word utterance was proposed and applied to ASR and CALL. Further, a reverse process, i.e. structure-to-speech conversion was realized. In this paper, a method of improving our speech generation framework based on a structural cost function is proposed and evaluated.
Keywords :
psychology; speech processing; speech synthesis; Gestalt; holistic sound patterns; infant language acquisition; psychology; speaking machine; speech communication; speech generation; speech sounds; speech structure; speech synthesizers; structural representation; word utterance; Cepstrum; Cost function; Electron tubes; Equations; Feature extraction; Speech; a structural cost function; invariance; speech synthesis; structural representation; vocal imitation;
Conference_Titel :
Signal Processing (ICSP), 2010 IEEE 10th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5897-4
DOI :
10.1109/ICOSP.2010.5656958