An HMM-based speech-to-video synthesizer

Author

Williams, Jay J. ; Katsaggelos, Aggelos K.

Author_Institution

Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA

Volume

13

Issue

4

fYear

2002

fDate

7/1/2002 12:00:00 AM

Firstpage

900

Lastpage

915

Abstract

Emerging broadband communication systems promise a future of multimedia telephony, e.g. the addition of visual information to telephone conversations. It is useful to consider the problem of generating the critical information useful for speechreading, based on existing narrowband communications systems used for speech. This paper focuses on the problem of synthesizing visual articulatory movements given the acoustic speech signal. In this application, the acoustic speech signal is analyzed and the corresponding articulatory movements are synthesized for speechreading. This paper describes a hidden Markov model (HMM)-based visual speech synthesizer. The key elements in the application of HMMs to this problem are the decomposition of the overall modeling task into key stages and the judicious determination of the observation vector´s components for each stage. The main contribution of this paper is a novel correlation HMM model that is able to integrate independently trained acoustic and visual HMMs for speech-to-visual synthesis. This model allows increased flexibility in choosing model topologies for the acoustic and visual HMMs. Moreover the propose model reduces the amount of training data compared to early integration modeling techniques. Results from objective experiments analysis show that the propose approach can reduce time alignment errors by 37.4% compared to conventional temporal scaling method. Furthermore, subjective results indicated that the purpose model can increase speech understanding.

Keywords

handicapped aids; hidden Markov models; image enhancement; speech processing; videotelephony; HMM-based speech-to-video synthesizer; broadband communication systems; hidden Markov model; lip-reading; multimedia telephony; speechreading; task decomposition; visual articulatory movement synthesis; Acoustic applications; Broadband communication; Hidden Markov models; Multimedia systems; Narrowband; Signal synthesis; Speech analysis; Speech synthesis; Synthesizers; Telephony;

fLanguage

English

Journal_Title

Neural Networks, IEEE Transactions on

Publisher

ieee

ISSN

1045-9227

Type

jour

DOI

10.1109/TNN.2002.1021891

Filename

1021891