Title :
On combining statistical methods and frequency warping for high-quality voice conversion
Author :
Erro, Daniel ; Polyakova, Tatyana ; Moreno, Asunción
Author_Institution :
TALP Res. Center, Politec. de Catalunya Univ., Barcelona
fDate :
March 31 2008-April 4 2008
Abstract :
In current voice conversion systems, obtaining a high similarity between converted and target voices requires a high degree of signal manipulation, which implies important quality degradation, up to the point that in some cases the quality scores are unacceptable for real-life applications. Indeed, a tradeoff can be observed between the similarity scores and the quality scores achieved by a given voice conversion system. In our previous works we proved that statistical methods and frequency warping transformations could be combined to yield a better similarity-quality balance than conventional systems, due to significant quality improvements. In this paper, two different ways of combining these two approaches are compared through perceptual tests in order to determine the best strategy for high-quality voice conversion. The comparison is made under the same training conditions, using the same speech model and vector dimensions. The results indicate that the Weighted Frequency Warping method is preferred by listeners.
Keywords :
speech synthesis; statistical analysis; quality degradation; signal manipulation; speech model; speech synthesis; statistical methods; voice conversion; weighted frequency warping; Degradation; Frequency conversion; Frequency synthesizers; Loudspeakers; Spatial databases; Speech synthesis; Statistical analysis; Testing; gaussian mixture model; speech synthesis; voice conversion; weighted frequency warping;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518697