Title :
High quality lips animation with speech and captured facial action unit as A/V input
Author :
Lijuan Wang ; Soong, Frank K.
Author_Institution :
Microsoft Res. Asia, Beijing, China
Abstract :
Rendering realistic lips movements in avatar with camera captured human´s facial features is desirable in many applications, e.g. telepresence, video gaming, social networking, etc. We have proposed to use Gaussian Mixture Model (GMM) to generate lips trajectory and successfully tested in speech-to-lips conversion experiments, where only audio signal (speech) is used as input. In this paper real-time user´s facial features called the Action Units (AUs) well tracked by Microsoft Kinect SDK with a consumer-grade RGB camera, are combined with speech to form joint A/V input for lips animation. We test the lips ani-mation performance and show that the new combined A/V input can improve the conversion error rate by 22% in a speaker de-pendent test, compared with a baseline system.
Keywords :
Gaussian processes; avatars; computer animation; face recognition; rendering (computer graphics); speaker recognition; AU; GMM; Gaussian mixture model; Microsoft Kinect SDK; audio signal; avatar; baseline system; captured facial action unit; consumer-grade RGB camera; conversion error rate; high quality lips animation; joint A-V input; lips trajectory generation; realistic lips movements rendering; realtime user facial features; social networking; speaker dependent test; speech-to-lips conversion; telepresence; video gaming; Cameras; Microphones; Principal component analysis; Speech; TV; Training;
Conference_Titel :
Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
Conference_Location :
Hollywood, CA
Print_ISBN :
978-1-4673-4863-8