Author/Authors :
Ig-Jae Kim ، نويسنده , , Hyeong-Seok Ko، نويسنده ,
Abstract :
This paper proposes a new technique for generating three-dimensional speech animation. The proposed technique
takes advantage of both data-driven and machine learning approaches. It seeks to utilize the most relevant part
of the captured utterances for the synthesis of input phoneme sequences. If highly relevant data are missing or
lacking, then it utilizes less relevant (but more abundant) data and relies more heavily on machine learning for the
lip-synch generation. This hybrid approach produces results that are more faithful to real data than conventional
machine learning approaches, while being better able to handle incompleteness or redundancy in the database
than conventional data-driven approaches. Experimental results, obtained by applying the proposed technique to
the utterance of various words and phrases, show that (1) the proposed technique generates lip-synchs of different
qualities depending on the availability of the data, and (2) the new technique produces more realistic results than
conventional machine learning approaches