DocumentCode
2976145
Title
Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems
Author
Potamianos, Gerasimos
Author_Institution
Inst. of Inf. & Telecommun., Nat. Centre for Sci. Res. Demokritos, Athens, Greece
fYear
2009
fDate
Nov. 13 2009-Dec. 17 2009
Firstpage
22
Lastpage
22
Abstract
Summary form only given. The presentation will provide an overview of the main research achievements and the state-of-the-art in the area of audiovisual speech processing, mainly focusing in the area of audio-visual automatic speech recognition. The topic has been of interest in the speech research community due to the potential of increased robustness to acoustic noise that the visual modality holds. Nevertheless, significant challenges remain that have hindered practical applications of the technology most notably difficulties with visual speech information extraction and audio-visual fusion algorithms that remain robust to the audio-visual environment variability inherent in practical, unconstrained interaction scenarios and audio-visual data sources, for example multiparty interaction in smart spaces, broadcast news, etc. These challenges are also shared across a number of interesting audio-visual speech technologies beyond the core speech recognition problem, where the visual modality has the potential to resolve ambiguity inherent in the audio signal alone; for example, speech activity detection, speaker diarization, and source separation.
Keywords
audio-visual systems; speech recognition; acoustic noise; audio-visual automatic speech recognition; audio-visual data sources; audio-visual fusion algorithm; bimodal speech technology; visual modality; visual speech information extraction; Acoustic noise; Automatic speech recognition; Broadcast technology; Broadcasting; Data mining; Noise robustness; Space technology; Speech enhancement; Speech processing; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location
Merano
Print_ISBN
978-1-4244-5478-5
Electronic_ISBN
978-1-4244-5479-2
Type
conf
DOI
10.1109/ASRU.2009.5373530
Filename
5373530
Link To Document