• DocumentCode
    680208
  • Title

    Visual speech learning from an e-tutor via dynamic lip movement-based video segmentation and comparison

  • Author

    Mazuera, Carol ; Xiaodong Yang ; YingLi Tian

  • Author_Institution
    Dept. of Electr. Eng., City Coll. of New York, New York, NY, USA
  • fYear
    2013
  • fDate
    18-21 Dec. 2013
  • Firstpage
    548
  • Lastpage
    553
  • Abstract
    This paper is motivated by the difficulties that deaf students encounter when learning speechreading and speaking; the skills that enable them to effectively communicate with hearing people. In this paper, we propose a speech learning prototype system based on the analysis and comparison of lip movements of an E-Tutor and those of a deaf student in a video. The main framework of our proposed system can be divided into two stages: lip movement segmentation and speech comparison. Lip movement segmentation fragments the frames of each word from a visual speech video sequence by analyzing the movement and shape of lips. Comparison determines whether a student is producing a correct word utterance or not, this is accomplished by comparing the lip shape and movements according to that of an e-tutor. To model lip movement, we compute two dynamic-based features by using a lip tracking method, which employs landmark points to define lip shapes. We utilize these dynamic features along with Space-Time Interest Points (STIP) to capture lip movements. In order to evaluate the effectiveness of our proposed methods, we collect a visual speech learning dataset consisting of 220 videos and 1100 word utterances. The proposed system achieves promising performances in both visual speech segmentation and visual speech comparison on this dataset.
  • Keywords
    image motion analysis; image segmentation; image sequences; medical image processing; object tracking; speech processing; video signal processing; E-Tutor; deaf students; dynamic lip movement-based video segmentation; lip tracking method; space-time interest points; visual speech comparison; visual speech learning; visual speech video sequence; Auditory system; Lips; Mouth; Shape; Speech; Video sequences; Visualization; Dataset Collection; Deaf People; Segmentation; Visual Speech Comparison; Visual Speech Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Type

    conf

  • DOI
    10.1109/BIBM.2013.6732556
  • Filename
    6732556