Title :
Quality-enhanced voice morphing using maximum likelihood transformations
Author :
Ye, Hui ; Young, Steve
Author_Institution :
Eng. Dept., Cambridge Univ.
fDate :
7/1/2006 12:00:00 AM
Abstract :
Voice morphing is a technique for modifying a source speaker´s speech to sound as if it was spoken by some designated target speaker. The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and linear transformations estimated from time-aligned parallel training data are commonly used to achieve this. However, the naive application of envelope transformation combined with the necessary pitch and duration modifications will result in noticeable artifacts. This paper studies the linear transformation approach to voice morphing and investigates these two specific issues. First, a general maximum likelihood framework is proposed for transform estimation which avoids the need for parallel training data inherent in conventional least mean square approaches. Second, the main causes of artifacts are identified as being due to glottal coupling, unnatural phase dispersion and the high spectral variance of unvoiced sounds, and compensation techniques are developed to mitigate these. The resulting voice morphing system is evaluated using both subjective and objective measures. These tests show that the proposed approaches are capable of effectively transforming speaker identity whilst maintaining high quality. Furthermore, they do not require carefully prepared parallel training data
Keywords :
maximum likelihood estimation; speaker recognition; speech processing; speech synthesis; compensation techniques; conventional least mean square approaches; glottal coupling; high spectral variance; linear transformation; maximum likelihood transformations; quality-enhanced voice morphing; source speaker; spectral envelope; target speaker; time-aligned parallel training data; unnatural phase dispersion; Degradation; Interpolation; Loudspeakers; Maximum likelihood estimation; Natural languages; Robustness; Speech enhancement; Speech synthesis; Testing; Training data; Linear transformation; phase dispersion; voice conversion; voice morphing;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TSA.2005.860839