Title :
Audiovisual-to-Articulatory Speech Inversion Using HMMs
Author :
Katsamanis, Athanassios ; Papandreou, George ; Maragos, Petros
Author_Institution :
Nat. Tech. Univ. of Athens, Athens
Abstract :
We address the problem of audiovisual speech inversion, namely recovering the vocal tract´s geometry from auditory and visual speech cues. We approach the problem in a statistical framework, combining ideas from multistream Hidden Markov Models and canonical correlation analysis, and demonstrate effective estimation of the trajectories followed by certain points of interest in the speech production system. Our experiments show that exploiting both audio and visual modalities clearly improves performance relative to either audio-only or visual-only estimation. We report experiments on the QSMT database which contains audio, video, and electromagnetic articulography data recorded in parallel.
Keywords :
correlation methods; geometry; hidden Markov models; speech processing; HMM; QSMT database; audiovisual speech inversion; audiovisual-to-articulatory speech inversion; auditory cues; canonical correlation analysis; multistream hidden Markov models; speech production system; statistical framework; visual speech cues; vocal tract geometry; Acoustics; Databases; Frequency estimation; Geometry; Hidden Markov models; Linear regression; Predictive models; Production systems; Speech analysis; Tongue;
Conference_Titel :
Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on
Conference_Location :
Crete
Print_ISBN :
978-1-4244-1274-7
DOI :
10.1109/MMSP.2007.4412915