Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems

Author

Potamianos, Gerasimos

Author_Institution

Inst. of Inf. & Telecommun., Nat. Centre for Sci. Res. Demokritos, Athens, Greece

fYear

2009

fDate

Nov. 13 2009-Dec. 17 2009

Firstpage

22

Lastpage

22

Abstract

Summary form only given. The presentation will provide an overview of the main research achievements and the state-of-the-art in the area of audiovisual speech processing, mainly focusing in the area of audio-visual automatic speech recognition. The topic has been of interest in the speech research community due to the potential of increased robustness to acoustic noise that the visual modality holds. Nevertheless, significant challenges remain that have hindered practical applications of the technology most notably difficulties with visual speech information extraction and audio-visual fusion algorithms that remain robust to the audio-visual environment variability inherent in practical, unconstrained interaction scenarios and audio-visual data sources, for example multiparty interaction in smart spaces, broadcast news, etc. These challenges are also shared across a number of interesting audio-visual speech technologies beyond the core speech recognition problem, where the visual modality has the potential to resolve ambiguity inherent in the audio signal alone; for example, speech activity detection, speaker diarization, and source separation.

Keywords

audio-visual systems; speech recognition; acoustic noise; audio-visual automatic speech recognition; audio-visual data sources; audio-visual fusion algorithm; bimodal speech technology; visual modality; visual speech information extraction; Acoustic noise; Automatic speech recognition; Broadcast technology; Broadcasting; Data mining; Noise robustness; Space technology; Speech enhancement; Speech processing; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Conference_Location

Merano

Print_ISBN

978-1-4244-5478-5

Electronic_ISBN

978-1-4244-5479-2

Type

conf

DOI

10.1109/ASRU.2009.5373530

Filename

5373530