Title :
Enhancing speech recognition and speech understanding systems through non-phonetic cues
Author_Institution :
Dept. of Comput. Sci., Winston-Salem State Univ., Winston-Salem, NC, USA
Abstract :
Theoretical and practical techniques for the automatic extraction, representation and manipulation of information supplied by the body movements of a speaker are discussed. Since speech production and facial expressions have been shown to exhibit a synchronous relationship, optical information obtained from a camera trained on the speaker´s face and body can prove useful in both correlating and disambiguating phoneme recognition by a voice recognition system. At the theoretical level, the roles played by proxemics, kinesics and deixis in interpreting body movement data in a linguistic setting are discussed. Also discussed are two systems under development: the vision microphone to aid in the recognition of both continuous speech and discrete speech by tracking acoustic data and the facial movements which accompany speech production; and gesture understanding system to extract information from the body movements of a speaker at both the microkinesic and macrokinesic levels for interpreting the communication content of speaker´s utterance
Keywords :
computer vision; gesture recognition; speech recognition; speech recognition equipment; speech-based user interfaces; body movements; computer vision; facial expressions; gesture understanding system; nonphonetic cues; optical information; speech recognition; speech understanding systems; vision microphone; Cameras; Data mining; Face detection; Face recognition; Loudspeakers; Microphones; Production systems; Speech enhancement; Speech recognition; Tracking;
Conference_Titel :
Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
0-7803-4778-1
DOI :
10.1109/ICSMC.1998.727494