Mapping between ultrasound and vowel speech using DNN framework

Author

Xinyuan Zheng ; Jianguo Wei ; Wenhuan Lu ; Qiang Fang ; Jianwu Dang

Author_Institution

Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China

fYear

2014

fDate

12-14 Sept. 2014

Firstpage

372

Lastpage

376

Abstract

Building up the mapping between articulatory movements and corresponding speech could great facility the speech training and speech aid for voiceless patients. In this paper, we propose a deep learning framework for building up a mapping between articulatory information and corresponding speech, which were recorded by ultrasound system. The dataset includes six Chinese vowels. We use Bimodal Deep Autoencoder algorithm based on RBM to learn the relationship between speech and articulation, the weights matrix of representation of them. Speech and ultrasound images have been reconstructed using the extracted features. The reconstruction error of articulation by our method is less than that of PCA based approach. The reconstructed speech is similar to the original one. We propose a mapping from ultrasound tongue image to acoustic signal with a revised Denoising Autoencoder, the results show that it is a promising approach. In contrast, another experiment is conducted to synthesize the ultrasound tongue image from the speech, but the result should be improved.

Keywords

acoustic signal processing; feature extraction; handicapped aids; image reconstruction; natural language processing; neural nets; speech processing; ultrasonic imaging; Chinese vowels; DNN framework; RBM; acoustic signal; articulatory movements; bimodal deep autoencoder algorithm; deep neural network framework; denoising autoencoder; feature extraction; speech aid; speech reconstruction; speech training; ultrasound image reconstruction; ultrasound system; ultrasound tongue image; voiceless patients; vowel speech; Acoustics; Feature extraction; Image reconstruction; Speech; Synchronization; Tongue; Ultrasonic imaging; DBN; Denoising Autoencoder; articulatory-acoustic mapping;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location

Singapore

Type

conf

DOI

10.1109/ISCSLP.2014.6936700

Filename

6936700