Title :
Mapping frames with DNN-HMM recognizer for non-parallel voice conversion
Author :
Minghui Dong;Chenyu Yang;Yanfeng Lu;Jochen Walter Ehnes;Dongyan Huang;Huaiping Ming;Rong Tong;Siu Wa Lee;Haizhou Li
Author_Institution :
Human Language Technology Department, Institute for Infocomm Research, A-Star, Singapore
Abstract :
To convert one speaker´s voice to another´s, the mapping of the corresponding speech segments from source speaker to target speaker must be obtained first. In parallel voice conversion, normally dynamic time warping (DTW) method is used to align signals of source and target voices. However, for conversion between non-parallel speech data, the DTW based mapping method does not work. In this paper, we propose to use a DNN-HMM recognizer to recognize each frame for both source and target speech signals. The vector of pseudo likelihood is then used to represent the frame. Similarity between two frames is measured with the distance between the vectors. A clustering method is used to group both source and target frames. Frame mapping from source to target is then established based on the clustering result. The experiments show that the proposed method can generate similar conversion results compared to parallel voice conversion.
Keywords :
Decision support systems
Conference_Titel :
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
DOI :
10.1109/APSIPA.2015.7415320