Title :
Implementing PCA-based speaker adaptation methods in a Persian ASR system
Author :
Ansari, Zohreh ; Almasganj, Farshad
Author_Institution :
Biomed. Eng. Fac., Amirkabir Univ. of Technol. (Tehran Polytechninc), Tehran, Iran
Abstract :
Eigenspace-based speaker adaptation approaches, such as eigenvoice (EV) and eigenspace-based MLLR (EMLLR) have been shown to be more effective than traditional speaker adaptation algorithms for rapid adaptation tasks. In these methods, principal component analysis (PCA) is applied to a diverse set of speaker characteristics to extract orthogonal basis vectors which include the most variations of speakers´ voices. Subsequently, a speaker adapted model is represented as a weighted combination of these extracted basis vectors. Hence, the number of parameters to be estimated from the adaptation data dramatically decreases. In this paper, a set of experiments are conducted on FARSDAT database, employing short lengths of adaptation speech data (5-10 seconds). Experimental results show that 3.8% and 3.3% absolute improvements in phoneme recognition rate are gained by the supervised and unsupervised adaptations, respectively. These could be considered against common speaker adaptation methods such as MLLR, which could not work properly in these conditions. Moreover, the development of EV algorithm, for tasks in which a considerable amount of adaptation data is available, is followed by segmenting the eigenspace. The effects of increasing the number of mixtures of acoustic hidden Markov models, and using a different feature extraction method in the presented algorithm are studied.
Keywords :
eigenvalues and eigenfunctions; feature extraction; hidden Markov models; maximum likelihood estimation; parameter estimation; principal component analysis; speaker recognition; EV algorithm; FARSDAT database; PCA-based speaker adaptation method; Persian ASR system; acoustic hidden Markov model; eigenspace-based MLLR; eigenspace-based speaker adaptation approach; feature extraction method; maximum likelihood linear regression algorithm; orthogonal basis vector; parameter estimation; phoneme recognition rate; principal component analysis; speaker adapted model; time 5 s to 10 s; Adaptation model; Mel frequency cepstral coefficient; Silicon; eigenvoice; normalizing; principal component analysis; speaker adaptation;
Conference_Titel :
Telecommunications (IST), 2010 5th International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4244-8183-5
DOI :
10.1109/ISTEL.2010.5734126