DocumentCode :
1904517
Title :
Sensory fusion: integrating visual and auditory information for recognizing speech
Author :
Wolff, Gregory J.
Author_Institution :
Ricoh California Res. Center, Menlo Park, CA, USA
fYear :
1993
fDate :
1993
Firstpage :
672
Abstract :
A straightforward method of combining information from separate sources for pattern recognition is reported. Conditional class probabilities are computed for each channel independently, using a special network architecture. These probabilities are then combined according to Bayes´ rule under the assumption of conditional independence. This method does not require any parameters other than those used to model each modality individually, is simple to compute, and automatically compensates for differences between the two channels. It is argued that this method can be a good starting point, even if conditional independence does not hold, especially when limited training data is available. Experiments in recognizing phonemes from acoustic and visual inputs indicate that this method can outperform more powerful models, and that is exhibits behavior similar to humans
Keywords :
Bayes methods; neural nets; probability; sensor fusion; speech recognition; Bayes method; acoustic inputs; network architecture; pattern recognition; probabilities; recognizing speech; sensor fusion; time delay neural network; visual inputs; Acoustic measurements; Computer architecture; Computer networks; Humans; Information resources; Learning systems; Pattern recognition; Signal detection; Speech recognition; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 1993., IEEE International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
0-7803-0999-5
Type :
conf
DOI :
10.1109/ICNN.1993.298635
Filename :
298635
Link To Document :
بازگشت