Title :
HMM-based speech recognition using state-dependent, linear transforms on Mel-warped DFT features
Author :
Rathinavelu, C. ; Deng, L.
Author_Institution :
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada
Abstract :
We investigate the interactions of front-end feature extraction and back-end classification techniques in HMM based speech recognizer. This work concentrates on finding the optimal linear transformation of Mel-warped short-time DFT information according to the minimum classification error criterion. These transformations, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error count. The discriminatively derived state-dependent transformations on the DFT data are then combined with their first time derivatives to produce a basic feature set. Experimental results show that Mel-warped DFT features, subject to appropriate transformation in a state-dependent manner, are more effective than the Mel-frequency cepstral coefficients that have dominated current speech recognition technology. The best error rate reduction of 9% is obtained using the new model, tested on a TIMIT phone classification task, relative to conventional HMM
Keywords :
cepstral analysis; discrete Fourier transforms; feature extraction; hidden Markov models; learning (artificial intelligence); speech processing; speech recognition; DFT data; HMM based speech recognition; HMM parameters; Mel warped DFT features; Mel-frequency cepstral coefficients; TIMIT phone classification task; back-end classification techniques; error rate reduction; experimental results; feature set; first time derivatives; front-end feature extraction; gradient descent method; linear transforms; minimum classification error criterion; optimal linear transformation; state-dependent transformations; supervised learning; Automatic speech recognition; Cepstral analysis; Discrete Fourier transforms; Discrete cosine transforms; Feature extraction; Filter bank; Hidden Markov models; Mel frequency cepstral coefficient; Speech recognition; Vectors;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
0-7803-3192-3
DOI :
10.1109/ICASSP.1996.540277