DocumentCode :
794834
Title :
An HMM-based speech-to-video synthesizer
Author :
Williams, Jay J. ; Katsaggelos, Aggelos K.
Author_Institution :
Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA
Volume :
13
Issue :
4
fYear :
2002
fDate :
7/1/2002 12:00:00 AM
Firstpage :
900
Lastpage :
915
Abstract :
Emerging broadband communication systems promise a future of multimedia telephony, e.g. the addition of visual information to telephone conversations. It is useful to consider the problem of generating the critical information useful for speechreading, based on existing narrowband communications systems used for speech. This paper focuses on the problem of synthesizing visual articulatory movements given the acoustic speech signal. In this application, the acoustic speech signal is analyzed and the corresponding articulatory movements are synthesized for speechreading. This paper describes a hidden Markov model (HMM)-based visual speech synthesizer. The key elements in the application of HMMs to this problem are the decomposition of the overall modeling task into key stages and the judicious determination of the observation vector´s components for each stage. The main contribution of this paper is a novel correlation HMM model that is able to integrate independently trained acoustic and visual HMMs for speech-to-visual synthesis. This model allows increased flexibility in choosing model topologies for the acoustic and visual HMMs. Moreover the propose model reduces the amount of training data compared to early integration modeling techniques. Results from objective experiments analysis show that the propose approach can reduce time alignment errors by 37.4% compared to conventional temporal scaling method. Furthermore, subjective results indicated that the purpose model can increase speech understanding.
Keywords :
handicapped aids; hidden Markov models; image enhancement; speech processing; videotelephony; HMM-based speech-to-video synthesizer; broadband communication systems; hidden Markov model; lip-reading; multimedia telephony; speechreading; task decomposition; visual articulatory movement synthesis; Acoustic applications; Broadband communication; Hidden Markov models; Multimedia systems; Narrowband; Signal synthesis; Speech analysis; Speech synthesis; Synthesizers; Telephony;
fLanguage :
English
Journal_Title :
Neural Networks, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9227
Type :
jour
DOI :
10.1109/TNN.2002.1021891
Filename :
1021891
Link To Document :
بازگشت