Title :
Sensory fusion: integrating visual and auditory information for recognizing speech
Author :
Wolff, Gregory J.
Author_Institution :
Ricoh California Res. Center, Menlo Park, CA, USA
Abstract :
A straightforward method of combining information from separate sources for pattern recognition is reported. Conditional class probabilities are computed for each channel independently, using a special network architecture. These probabilities are then combined according to Bayes´ rule under the assumption of conditional independence. This method does not require any parameters other than those used to model each modality individually, is simple to compute, and automatically compensates for differences between the two channels. It is argued that this method can be a good starting point, even if conditional independence does not hold, especially when limited training data is available. Experiments in recognizing phonemes from acoustic and visual inputs indicate that this method can outperform more powerful models, and that is exhibits behavior similar to humans
Keywords :
Bayes methods; neural nets; probability; sensor fusion; speech recognition; Bayes method; acoustic inputs; network architecture; pattern recognition; probabilities; recognizing speech; sensor fusion; time delay neural network; visual inputs; Acoustic measurements; Computer architecture; Computer networks; Humans; Information resources; Learning systems; Pattern recognition; Signal detection; Speech recognition; Training data;
Conference_Titel :
Neural Networks, 1993., IEEE International Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
0-7803-0999-5
DOI :
10.1109/ICNN.1993.298635