• DocumentCode
    179243
  • Title

    Continuous visual speech recognition for multimodal fusion

  • Author

    Benhaim, Eric ; Sahbi, Hichem ; Vitte, Guillaume

  • Author_Institution
    LTCI, Telecom ParisTech, Paris, France
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    4618
  • Lastpage
    4622
  • Abstract
    It is admitted that human speech perception is a multimodal process that combines both visual and acoustic informations. In automatic speech perception, visual analysis is also crucial as it provides a complementary information in order to enhance the performances of audio systems especially in highly noisy environments. In this paper, we propose a unified probabilistic framework for speech unit recognition that combines both visual and audio informations. The method is based on the optimization of a criterion that achieves continuous speech unit segmentation and decoding using a learned (joint) phonetic-visemic model. Experiments conducted on the standard LIPS2008 dataset, show a clear and a consistent gain of our multimodal approach compared to others.
  • Keywords
    decoding; probability; speech recognition; acoustic informations; audio informations; automatic speech perception; complementary information; continuous speech unit segmentation; continuous visual speech recognition; decoding; human speech perception; multimodal fusion; phonetic-visemic model; speech unit recognition; standard LIPS2008 dataset; unified probabilistic framework; visual informations; Acoustics; Joints; Speech; Speech enhancement; Speech recognition; Training; Visualization; Visual speech unit recognition; multi-class support vector machines; multimodal segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854477
  • Filename
    6854477