DocumentCode :
12631
Title :
A Compact Representation of Visual Speech Data Using Latent Variables
Author :
Ziheng Zhou ; Xiaopeng Hong ; Guoying Zhao ; Pietikainen, Matti
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Oulu, Oulu, Finland
Volume :
36
Issue :
1
fYear :
2014
fDate :
Jan. 2014
Firstpage :
1
Lastpage :
1
Abstract :
The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the inter-speaker variations of visual appearances and those caused by uttering, and incorporates the structural information of the observed visual data within an utterance through modelling the structure using a path graph and placing variables´ priors along its embedded curve.
Keywords :
graph theory; image representation; image sequences; speech recognition; video signal processing; compact data representation; embedded curve; generative latent variable model; high-dimensional visual space; inter-speaker variations; path graph; utterance; video dynamics; visual appearances; visual speech data; visual speech recognition; Data models; Hidden Markov models; Image sequences; Mouth; Speech; Speech recognition; Visualization; Computer vision; Data models; Hidden Markov models; Image sequences; Mouth; Pattern analysis; Representations; Speech; Speech recognition; Visualization; and transforms; data structures; Databases, Factual; Humans; Pattern Recognition, Automated; Speech; Speech Recognition Software; Video Recording;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2013.173
Filename :
6601598
Link To Document :
بازگشت