مرکز منطقه ای اطلاع رساني علوم و فناوري - Segmental eigenvoice with delicate eigenspace for improved speaker adaptation

Abstract :

Eigenvoice techniques have been proposed to provide rapid speaker adaptation with very limited adaptation data, but the performance may be saturated when more adaptation data become available. This is because in these techniques an eigenspace with reduced dimensionality is established by properly utilizing the a priori knowledge from the large quantity of training data. The reduced dimensionality of the eigenspace requires less adaptation data to estimate the model parameters for the new speaker, but also makes it less easy to obtain more precise models with more adaptation data. In this paper, a new segmental eigenvoice approach is proposed, in which the eigenspace can be further segmented into N subeigenspaces by properly classifying the model parameters into N clusters. These N subeigenspaces can help to construct a more delicate eigenspace and more precise models when more adaptation data are available. It will be shown that there can be at least mixture-based, model-based and feature-based segmental eigenvoice approaches. Not only improved performance can be obtained, but these different approaches can be properly integrated to offer better performance. Two further approaches leading to improved segmental eigenvoice techniques with even better performance are also proposed. The experiments were performed with both a large vocabulary and a small vocabulary recognition tasks.