Title :
A Compact Representation of Visual Speech Data Using Latent Variables
Author :
Ziheng Zhou ; Xiaopeng Hong ; Guoying Zhao ; Pietikainen, Matti
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Oulu, Oulu, Finland
Abstract :
The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the inter-speaker variations of visual appearances and those caused by uttering, and incorporates the structural information of the observed visual data within an utterance through modelling the structure using a path graph and placing variables´ priors along its embedded curve.
Keywords :
graph theory; image representation; image sequences; speech recognition; video signal processing; compact data representation; embedded curve; generative latent variable model; high-dimensional visual space; inter-speaker variations; path graph; utterance; video dynamics; visual appearances; visual speech data; visual speech recognition; Data models; Hidden Markov models; Image sequences; Mouth; Speech; Speech recognition; Visualization; Computer vision; Data models; Hidden Markov models; Image sequences; Mouth; Pattern analysis; Representations; Speech; Speech recognition; Visualization; and transforms; data structures; Databases, Factual; Humans; Pattern Recognition, Automated; Speech; Speech Recognition Software; Video Recording;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2013.173