مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker normalization and adaptation based on linear transformation

DocumentCode :

310578

Title :

Speaker normalization and adaptation based on linear transformation

Author :

Ishii, Jun ; Tonomura, Masahiro

Author_Institution :

ATR Interpreting Telecommun. Res. Labs., Kyoto, Japan

Volume :

fYear :

1997

fDate :

21-24 Apr 1997

Firstpage :

1055

Abstract :

We propose novel speaker independent (SI) modeling and speaker adaptation based on a linear transformation. An SI model and speaker dependent (SD) models are usually generated using the same preprocessing of acoustic data. This straightforward preprocessing causes a serious problem. Probability distributions of the SI models become broad and the SI models do not give good initial estimates for speaker adaptation. To solve these problems, a normalized SI model is generated by removing speaker characteristics using a shift vector obtained by the maximum likelihood linear regression (MLLR) technique. In addition, we propose a speaker adaptation method that combines the MLLR and maximum a posteriori (MAP) techniques from the normalized SI model. Experiments have been performed on Japanese phoneme recognition test using continuous density mixture Gaussian HMMs. For the baseline recognition test of normalized SI model, a 12.8% reduction of the phoneme recognition error rate compared to the conventional SI model was achieved. Furthermore the proposed adaptation method using the normalized SI model was more effective than the tested conventional method regardless the amount of adaptation data

Keywords :

Gaussian processes; acoustic signal processing; hidden Markov models; maximum likelihood estimation; probability; speaker recognition; speech processing; Japanese phoneme recognition test; acoustic data preprocessing; adaptation data; continuous density mixture Gaussian HMM; experiments; initial estimates; linear transformation; maximum a posteriori techniques; maximum likelihood linear regression; normalized SI model; phoneme recognition error rate reduction; probability distributions; shift vector; speaker adaptation; speaker dependent models; speaker independent modeling; speaker normalization; Adaptation model; Character generation; Error analysis; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Performance evaluation; Probability distribution; Testing; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

Conference_Location :

Munich

ISSN :

1520-6149

Print_ISBN :

0-8186-7919-0

Type :

conf

DOI :

10.1109/ICASSP.1997.596122

Filename :

596122

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=310578