Title : 
Speaker transformation using sentence HMM based alignments and detailed prosody modification
         
        
            Author : 
Arslan, Levent M. ; Talkin, David
         
        
            Author_Institution : 
Entropic Res. Lab., Washington, DC, USA
         
        
        
        
        
        
            Abstract : 
This paper presents several improvements to our voice conversion system which we refer to as speaker transformation algorithm using segmental codebooks (STASC). First, a new concept, sentence HMM, is introduced for the alignment of speech waveforms sharing the same text. This alignment technique allows reliable and high resolution mapping between two speech waveforms. In addition, it is observed that energy and speaking rate differences between two speakers are not constant across all phonemes. Therefore a codebook based duration and energy scaling algorithm is proposed. Finally, a more detailed pitch modification is introduced that takes into account pitch range differences between source and target speakers in addition to mean pitch level differences. The proposed changes made a significant impact on the quality of transformed speech. Subjective listening tests showed that intelligibility is maintained at the same level as natural speech after the speaker transformation
         
        
            Keywords : 
hidden Markov models; speech coding; speech intelligibility; speech processing; speech synthesis; codebook based duration algorithm; detailed prosody modification; energy differences; energy scaling algorithm; high resolution mapping; mean pitch level differences; phonemes; pitch modification; pitch range differences; segmental codebooks; sentence HMM based alignments; source speakers; speaker transformation; speaking rate differences; speech intelligibility; speech waveforms; subjective listening tests; target speakers; voice conversion system; Energy resolution; Frequency estimation; Hidden Markov models; Laboratories; Loudspeakers; Multimedia systems; Natural languages; Speech recognition; Speech synthesis; Training data;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
         
        
            Conference_Location : 
Seattle, WA
         
        
        
            Print_ISBN : 
0-7803-4428-6
         
        
        
            DOI : 
10.1109/ICASSP.1998.674424