Title :
Estimate articulatory MRI series from acoustic signal using deep architecture
Author :
Hao Li ; Jianhua Tao ; Minghao Yang ; Bin Liu
Author_Institution :
Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
Abstract :
This paper presents our work on acoustic-to-articulatory inversion mapping, in which, the articulatory data is the MRI series for articulators on mid-sagittal plan. Deep architectures based on restricted Boltzmann machine (RBM) and linear regression are employed to construct the audio-visual mapping. We test two architectures to initialize the neural network: the bottom-up stacked RBM with top regression layer architecture and the one with extra Gaussian-Bernoulli RBM on the top of the former architecture. GMM-based mapping is used as baseline method. The MRI data from USC-TIMIT database is used for the training. The experimental results show that the deep regression network is an effective model to construct the mapping from acoustic speech signal to articulatory MRI series, and also indicate that it is a better strategy to initial the top layer as Gaussian-Bernoulli RBM to compress the MRI data before the liner regression.
Keywords :
Boltzmann machines; acoustic signal processing; audio databases; magnetic resonance imaging; speech processing; MRI data; RBM; USC-TIMIT database; acoustic signal; acoustic speech signal; acoustic-to-articulatory inversion mapping; audio-visual mapping; deep architecture; deep architectures; deep regression network; estimate articulatory MRI series; extra Gaussian-Bernoulli RBM; linear regression; midsagittal plan; neural network; restricted Boltzmann machine; top regression layer architecture; Acoustics; Head; Linear regression; Magnetic resonance imaging; Speech; Tongue; Training; MRI; acoustic-to-articulatory inversion; deep neural network; deep regression network;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178893