Title :
Adaptive training using structured transforms
Author :
Yu, K. ; Gales, M.J.F.
Author_Institution :
Dept. of Eng., Cambridge Univ., UK
Abstract :
Adaptive training is an important approach to training speech recognition systems on found, non-homogeneous data. The standard approach employs a single transform to represent unwanted acoustic variability. However, for found data there are commonly multiple acoustic factors affecting the speech signal. The paper investigates the use of multiple forms of transformations, structured transforms (ST), to represent the complex non-speech variabilities in an adaptive training framework. Two forms of transformation are considered, cluster mean interpolation and constrained MLLR; consequently, the canonical model here is a multi-cluster HMM model. Both ML and minimum phone error (MPE) reestimation formulae for the canonical model, are presented. This multi-cluster MPE training is also applicable to eigenvoice systems. Experiments to compare ST to standard adaptive training schemes were performed on a conversational telephone speech task. ST were found to reduce the word error rate significantly.
Keywords :
hidden Markov models; interpolation; learning (artificial intelligence); maximum likelihood estimation; natural languages; speech recognition; transforms; ML estimation; adaptive training; cluster mean interpolation; constrained MLLR; conversational telephone speech; eigenvoice systems; found data; minimum phone error estimation; nonhomogeneous data; speech recognition systems; structured transforms; unwanted acoustic variability; Acoustical engineering; Error analysis; Hidden Markov models; Interpolation; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Speech recognition; Telephony; Testing;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
Print_ISBN :
0-7803-8484-9
DOI :
10.1109/ICASSP.2004.1325986