DocumentCode :
310578
Title :
Speaker normalization and adaptation based on linear transformation
Author :
Ishii, Jun ; Tonomura, Masahiro
Author_Institution :
ATR Interpreting Telecommun. Res. Labs., Kyoto, Japan
Volume :
2
fYear :
1997
fDate :
21-24 Apr 1997
Firstpage :
1055
Abstract :
We propose novel speaker independent (SI) modeling and speaker adaptation based on a linear transformation. An SI model and speaker dependent (SD) models are usually generated using the same preprocessing of acoustic data. This straightforward preprocessing causes a serious problem. Probability distributions of the SI models become broad and the SI models do not give good initial estimates for speaker adaptation. To solve these problems, a normalized SI model is generated by removing speaker characteristics using a shift vector obtained by the maximum likelihood linear regression (MLLR) technique. In addition, we propose a speaker adaptation method that combines the MLLR and maximum a posteriori (MAP) techniques from the normalized SI model. Experiments have been performed on Japanese phoneme recognition test using continuous density mixture Gaussian HMMs. For the baseline recognition test of normalized SI model, a 12.8% reduction of the phoneme recognition error rate compared to the conventional SI model was achieved. Furthermore the proposed adaptation method using the normalized SI model was more effective than the tested conventional method regardless the amount of adaptation data
Keywords :
Gaussian processes; acoustic signal processing; hidden Markov models; maximum likelihood estimation; probability; speaker recognition; speech processing; Japanese phoneme recognition test; acoustic data preprocessing; adaptation data; continuous density mixture Gaussian HMM; experiments; initial estimates; linear transformation; maximum a posteriori techniques; maximum likelihood linear regression; normalized SI model; phoneme recognition error rate reduction; probability distributions; shift vector; speaker adaptation; speaker dependent models; speaker independent modeling; speaker normalization; Adaptation model; Character generation; Error analysis; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Performance evaluation; Probability distribution; Testing; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
ISSN :
1520-6149
Print_ISBN :
0-8186-7919-0
Type :
conf
DOI :
10.1109/ICASSP.1997.596122
Filename :
596122
Link To Document :
بازگشت