• DocumentCode
    2976145
  • Title

    Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems

  • Author

    Potamianos, Gerasimos

  • Author_Institution
    Inst. of Inf. & Telecommun., Nat. Centre for Sci. Res. Demokritos, Athens, Greece
  • fYear
    2009
  • fDate
    Nov. 13 2009-Dec. 17 2009
  • Firstpage
    22
  • Lastpage
    22
  • Abstract
    Summary form only given. The presentation will provide an overview of the main research achievements and the state-of-the-art in the area of audiovisual speech processing, mainly focusing in the area of audio-visual automatic speech recognition. The topic has been of interest in the speech research community due to the potential of increased robustness to acoustic noise that the visual modality holds. Nevertheless, significant challenges remain that have hindered practical applications of the technology most notably difficulties with visual speech information extraction and audio-visual fusion algorithms that remain robust to the audio-visual environment variability inherent in practical, unconstrained interaction scenarios and audio-visual data sources, for example multiparty interaction in smart spaces, broadcast news, etc. These challenges are also shared across a number of interesting audio-visual speech technologies beyond the core speech recognition problem, where the visual modality has the potential to resolve ambiguity inherent in the audio signal alone; for example, speech activity detection, speaker diarization, and source separation.
  • Keywords
    audio-visual systems; speech recognition; acoustic noise; audio-visual automatic speech recognition; audio-visual data sources; audio-visual fusion algorithm; bimodal speech technology; visual modality; visual speech information extraction; Acoustic noise; Automatic speech recognition; Broadcast technology; Broadcasting; Data mining; Noise robustness; Space technology; Speech enhancement; Speech processing; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
  • Conference_Location
    Merano
  • Print_ISBN
    978-1-4244-5478-5
  • Electronic_ISBN
    978-1-4244-5479-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2009.5373530
  • Filename
    5373530