DocumentCode :
2590578
Title :
Visual speech recognition with loosely synchronized feature streams
Author :
Saenko, Kate ; Livescu, Karen ; Siracusa, Michael ; Wilson, Kevin ; Glass, James ; Darrell, Trevor
Author_Institution :
Comput. Sci. & Artificial Intelligence Lab., Massachusetts Inst. of Technol., Cambridge, MA
Volume :
2
fYear :
2005
fDate :
17-21 Oct. 2005
Firstpage :
1424
Abstract :
We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of visual speech and articulate features, and then performs recognition using a model that accounts for the loose synchronization of the feature streams. Discriminative classifiers detect the subclass of lip appearance corresponding to the presence of speech, and further decompose it into features corresponding to the physical components of articulate production. These components often evolve in a semi-independent fashion, and conventional viseme-based approaches to recognition fail to capture the resulting co-articulation effects. We present a novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulate feature classifier scores, which can model varying degrees of co-articulation in a principled way. We evaluate our visual-only recognition system on a command utterance task. We show comparative results on lip detection and speech/non-speech classification, as well as recognition performance against several baseline systems
Keywords :
belief networks; feature extraction; image classification; speech recognition; synchronisation; Bayesian network; command utterance; feature classifier; feature streams; lip detection; nonspeech classification; speech classification; spoken isolated phrases detection; spoken isolated phrases recognition; visual speech detection; visual speech recognition; visual-only recognition system; Bayesian methods; Computer vision; Detectors; Face detection; Glass; Hidden Markov models; Speech recognition; Support vector machine classification; Support vector machines; Switches;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
Conference_Location :
Beijing
ISSN :
1550-5499
Print_ISBN :
0-7695-2334-X
Type :
conf
DOI :
10.1109/ICCV.2005.251
Filename :
1544886
Link To Document :
بازگشت