Automatic lip-synchronized video-self-modeling intervention for voice disorders

Author

Ju Shen ; Changpeng Ti ; Cheung, Sen-Ching Samson ; Patel, Ravi R.

Author_Institution

Center for Visualization & Virtual Environments, Univ. of Kentucky, Lexington, KY, USA

fYear

2012

fDate

10-13 Oct. 2012

Firstpage

244

Lastpage

249

Abstract

Video self-modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient´s original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Experimental evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques.

Keywords

audio signal processing; behavioural sciences; medical disorders; medical image processing; medical signal processing; patient treatment; speech; video signal processing; audio segmentation; audio track time markers; audiovisual algorithm; automatic lip synchronized VSM intervention; behavioral intervention technique; clean speech database; fluency disorder treatment; language disorder treatment; lip state detection; speech language pathology; stuttering treatment; text to speech engine; video self modeling; video sequences; video track time markers; voice disorder patient; voice disorders; voice similarity metric; Autism; Manuals; Robustness; Synchronization; audio-visual lip synchronization; video self modeling; voice disorders;

fLanguage

English

Publisher

ieee

Conference_Titel

e-Health Networking, Applications and Services (Healthcom), 2012 IEEE 14th International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4577-2039-0

Electronic_ISBN

978-1-4577-2038-3

Type

conf

DOI

10.1109/HealthCom.2012.6379415

Filename

6379415