DocumentCode :
600144
Title :
Speaker dependent visual word recognition by using sequential mouth shape codes
Author :
Tasaka, Takafumi ; Hamada, Nozomu
Author_Institution :
Dept. of Syst. Design Eng., Keio Univ., Yokohama, Japan
fYear :
2012
fDate :
4-7 Nov. 2012
Firstpage :
96
Lastpage :
101
Abstract :
Visual speech recognition or lip reading is an approach for noise robust speech recognition by adding speaker´s visual cues to audio information. Basically visual-only speech recognition is applicable to speaker verification and multimedia interface for supporting speaking impaired person. The sequential mouth-shape code method is an effective approach of lip reading for particularly uttered Japanese words by utilizing two kinds of distinctive mouth shapes, known as first and last mouth shapes, appeared intermittently. One advantage of this method is its low computational burden for the learning and word registration processes. This paper proposes a novel word lip recognition system by detecting and determining initial mouth-shape codes to recognize uttering consonants. The proposed method eventually is able to discriminate different words consisting of the same sequential vowel codes though containing different consonant codes. The conducted experiments demonstrate that the proposed system provides higher recognition rate than the conventional ones.
Keywords :
audio signal processing; multimedia computing; speech coding; speech recognition; distinctive mouth shape; lip reading; multimedia interface; noise robust speech recognition; sequential mouth shape code; sequential mouth-shape code method; sequential vowel code; speaker dependent visual word recognition; speaker verification; speaking impaired person; uttered Japanese word; uttering consonant recognition; visual cue; visual speech recognition; visual-only speech recognition; word lip recognition system; word registration process; Feature extraction; Image recognition; Mouth; Shape; Speech recognition; Trajectory; Visualization; audio-visual speech recognition; key frame exraction; lip reading; mouth-shape code; visual speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Signal Processing and Communications Systems (ISPACS), 2012 International Symposium on
Conference_Location :
New Taipei
Print_ISBN :
978-1-4673-5083-9
Electronic_ISBN :
978-1-4673-5081-5
Type :
conf
DOI :
10.1109/ISPACS.2012.6473460
Filename :
6473460
Link To Document :
بازگشت