Title :
Lipreading Using Profile Versus Frontal Views
Author :
Lucey, Patrick ; Potamianos, Gerasimos
Author_Institution :
Speech, Audio, Image & Video Res. Lab., Queensland Univ. of Technol., Brisbane, Qld.
Abstract :
Visual information from a speaker´s mouth region is known to improve automatic speech recognition robustness. However, the vast majority of audio-visual automatic speech recognition (AVASR) studies assume frontal images of the speaker´s face. In contrast, this paper investigates extracting visual speech information from the speaker´s profile view, and, to our knowledge, constitutes the first real attempt to attack this problem. As with any AVASR system, the overall recognition performance depends heavily on the visual front end. This is especially the case with profile-view data, as the facial features are heavily compacted compared to the frontal scenario. In this paper, we particularly describe our visual front end approach, and report experiments on a multi-subject, small-vocabulary, bimodal, multi-sensory database that contains synchronously captured audio with frontal and profile face video. Our experiments show that AVASR is possible from profile views with moderate performance degradation compared to frontal video data
Keywords :
audio-visual systems; face recognition; feature extraction; speech recognition; video databases; vocabulary; AVASR; audio capturing; audio-visual automatic speech recognition; frontal image; information extraction; multisensory database; speakers face; visual information; vocabulary; Automatic speech recognition; Data mining; Face detection; Facial features; Hidden Markov models; Humans; Laboratories; Mouth; Robustness; Speech recognition;
Conference_Titel :
Multimedia Signal Processing, 2006 IEEE 8th Workshop on
Conference_Location :
Victoria, BC
Print_ISBN :
0-7803-9751-7
Electronic_ISBN :
0-7803-9752-5
DOI :
10.1109/MMSP.2006.285261