• DocumentCode
    12631
  • Title

    A Compact Representation of Visual Speech Data Using Latent Variables

  • Author

    Ziheng Zhou ; Xiaopeng Hong ; Guoying Zhao ; Pietikainen, Matti

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Oulu, Oulu, Finland
  • Volume
    36
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan. 2014
  • Firstpage
    1
  • Lastpage
    1
  • Abstract
    The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the inter-speaker variations of visual appearances and those caused by uttering, and incorporates the structural information of the observed visual data within an utterance through modelling the structure using a path graph and placing variables´ priors along its embedded curve.
  • Keywords
    graph theory; image representation; image sequences; speech recognition; video signal processing; compact data representation; embedded curve; generative latent variable model; high-dimensional visual space; inter-speaker variations; path graph; utterance; video dynamics; visual appearances; visual speech data; visual speech recognition; Data models; Hidden Markov models; Image sequences; Mouth; Speech; Speech recognition; Visualization; Computer vision; Data models; Hidden Markov models; Image sequences; Mouth; Pattern analysis; Representations; Speech; Speech recognition; Visualization; and transforms; data structures; Databases, Factual; Humans; Pattern Recognition, Automated; Speech; Speech Recognition Software; Video Recording;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2013.173
  • Filename
    6601598