DocumentCode
12631
Title
A Compact Representation of Visual Speech Data Using Latent Variables
Author
Ziheng Zhou ; Xiaopeng Hong ; Guoying Zhao ; Pietikainen, Matti
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Oulu, Oulu, Finland
Volume
36
Issue
1
fYear
2014
fDate
Jan. 2014
Firstpage
1
Lastpage
1
Abstract
The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the inter-speaker variations of visual appearances and those caused by uttering, and incorporates the structural information of the observed visual data within an utterance through modelling the structure using a path graph and placing variables´ priors along its embedded curve.
Keywords
graph theory; image representation; image sequences; speech recognition; video signal processing; compact data representation; embedded curve; generative latent variable model; high-dimensional visual space; inter-speaker variations; path graph; utterance; video dynamics; visual appearances; visual speech data; visual speech recognition; Data models; Hidden Markov models; Image sequences; Mouth; Speech; Speech recognition; Visualization; Computer vision; Data models; Hidden Markov models; Image sequences; Mouth; Pattern analysis; Representations; Speech; Speech recognition; Visualization; and transforms; data structures; Databases, Factual; Humans; Pattern Recognition, Automated; Speech; Speech Recognition Software; Video Recording;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2013.173
Filename
6601598
Link To Document