Title :
Robust multi-modal person identification with tolerance of facial expression
Author :
Fox, Niall A. ; Reilly, Richard B.
Author_Institution :
Dept. of Electron. & Electr. Eng., Dublin Coll. Univ., Ireland
Abstract :
The research presented in This work describes audio-visual speaker identification experiments carried out on a large data set of 251 subjects. Both the audio and visual modeling is carried out using hidden Markov models. The visual modality uses the speaker´s lip information. The audio and visual modalities are both degraded to emulate a train/test mismatch. The fusion method employed adapts automatically by using classifier score reliability estimates of both modalities to give improved audio-visual accuracies at all tested levels of audio and visual degradation, compared to the individual audio or visual modality accuracies. A maximum visual identification accuracy of 86% was achieved. This result is comparable to the performance of systems using the entire face, and suggests the hypothesis that the system described would be tolerant to varying facial expression, since only the information around the speaker´s lips is employed.
Keywords :
face recognition; hidden Markov models; speech recognition; audio-visual speaker identification; facial expression; hidden Markov models; maximum visual identification; multimodal person identification; visual modality; Audio databases; Hidden Markov models; Identification of persons; Image databases; Signal processing; Visual databases;
Conference_Titel :
Systems, Man and Cybernetics, 2004 IEEE International Conference on
Conference_Location :
The Hague
Print_ISBN :
0-7803-8566-7
DOI :
10.1109/ICSMC.2004.1398362