مرکز منطقه ای اطلاع رساني علوم و فناوري - Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

Abstract :

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality reduction. In this paper, after introducing GMM2 method, several GMM models will be used to model each phoneme. Furthermore, in the stage of corresponding the clusters of each state, before applying Dynamic Time Warping algorithm, we use a LMR conversion for further correspondence among the parameters of two corresponding states of two speakers. Another reason for quality reduction in voice conversion system is that the precision of speech signal parameters was underestimated. In order to overcome such a problem, Generalized Harmonic Model is introduced which is replaced by sinusoid harmonic model applied in GMM2 giving another method called GMM3. Finally, we will present GMM4 method, the objective of which is to promote the system performance with limited data and a restricted number of demi-syllables to train conversion functions.