A Compact Representation of Visual Speech Data Using Latent Variables

Author

Ziheng Zhou ; Xiaopeng Hong ; Guoying Zhao ; Pietikainen, Matti

Author_Institution

Dept. of Comput. Sci. & Eng., Univ. of Oulu, Oulu, Finland

Volume

36

Issue

1

fYear

2014

fDate

Jan. 2014

Firstpage

1

Lastpage

1

Abstract

The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the inter-speaker variations of visual appearances and those caused by uttering, and incorporates the structural information of the observed visual data within an utterance through modelling the structure using a path graph and placing variables´ priors along its embedded curve.

Keywords

graph theory; image representation; image sequences; speech recognition; video signal processing; compact data representation; embedded curve; generative latent variable model; high-dimensional visual space; inter-speaker variations; path graph; utterance; video dynamics; visual appearances; visual speech data; visual speech recognition; Data models; Hidden Markov models; Image sequences; Mouth; Speech; Speech recognition; Visualization; Computer vision; Data models; Hidden Markov models; Image sequences; Mouth; Pattern analysis; Representations; Speech; Speech recognition; Visualization; and transforms; data structures; Databases, Factual; Humans; Pattern Recognition, Automated; Speech; Speech Recognition Software; Video Recording;

fLanguage

English

Journal_Title

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Publisher

ieee

ISSN

0162-8828

Type

jour

DOI

10.1109/TPAMI.2013.173

Filename

6601598