Title of article :
Fast model selection based speaker adaptation for nonnative speech
Author/Authors :
He، Xiaodong نويسنده , , Zhao، Yunxin نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2003
Pages :
-297
From page :
298
To page :
0
Abstract :
The problem of adapting acoustic models of native English speech to nonnative speakers is addressed from a perspective of adaptive model complexity selection. The goal is to select model complexity dynamically for each nonnative talker so as to optimize the balance between model robustness to pronunciation variations and model detailedness for discrimination of speech sounds. A maximum expected likelihood (MEL) based technique is proposed to enable reliable complexity selection when adaptation data are sparse, where expectation of loglikelihood (EL) of adaptation data is computed based on distributions of mismatch biases between model and data, and model complexity is selected to maximize EL. The MEL based complexity selection is further combined with MLLR (maximum likelihood linear regression) to enable adaptation of both complexity and parameters of acoustic models. Experiments were performed on WSJ1 data of speakers with a wide range of foreign accents. Results show that the MEL based complexity selection is feasible when using as little as one adaptation utterance, and it is able to select dynamically the proper model complexity as the adaptation data increases. Compared with the standard MLLR, the MEL+MLLR method leads to consistent and significant improvement to recognition accuracy on nonnative speakers, without performance degradation on native speakers.
Keywords :
low-temperature co-fired ceramic (LTCC) , millimeter wave , waveguide transition , rectangular waveguide (RWG) , Laminated waveguide
Journal title :
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
Serial Year :
2003
Journal title :
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
Record number :
86907
Link To Document :
بازگشت