Title :
Visual speech learning from an e-tutor via dynamic lip movement-based video segmentation and comparison
Author :
Mazuera, Carol ; Xiaodong Yang ; YingLi Tian
Author_Institution :
Dept. of Electr. Eng., City Coll. of New York, New York, NY, USA
Abstract :
This paper is motivated by the difficulties that deaf students encounter when learning speechreading and speaking; the skills that enable them to effectively communicate with hearing people. In this paper, we propose a speech learning prototype system based on the analysis and comparison of lip movements of an E-Tutor and those of a deaf student in a video. The main framework of our proposed system can be divided into two stages: lip movement segmentation and speech comparison. Lip movement segmentation fragments the frames of each word from a visual speech video sequence by analyzing the movement and shape of lips. Comparison determines whether a student is producing a correct word utterance or not, this is accomplished by comparing the lip shape and movements according to that of an e-tutor. To model lip movement, we compute two dynamic-based features by using a lip tracking method, which employs landmark points to define lip shapes. We utilize these dynamic features along with Space-Time Interest Points (STIP) to capture lip movements. In order to evaluate the effectiveness of our proposed methods, we collect a visual speech learning dataset consisting of 220 videos and 1100 word utterances. The proposed system achieves promising performances in both visual speech segmentation and visual speech comparison on this dataset.
Keywords :
image motion analysis; image segmentation; image sequences; medical image processing; object tracking; speech processing; video signal processing; E-Tutor; deaf students; dynamic lip movement-based video segmentation; lip tracking method; space-time interest points; visual speech comparison; visual speech learning; visual speech video sequence; Auditory system; Lips; Mouth; Shape; Speech; Video sequences; Visualization; Dataset Collection; Deaf People; Segmentation; Visual Speech Comparison; Visual Speech Learning;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732556