مرکز منطقه ای اطلاع رساني علوم و فناوري - HMM-based speech recognition using state-dependent, linear transforms on Mel-warped DFT features

DocumentCode :

302071

Title :

HMM-based speech recognition using state-dependent, linear transforms on Mel-warped DFT features

Author :

Rathinavelu, C. ; Deng, L.

Author_Institution :

Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada

Volume :

fYear :

1996

fDate :

7-10 May 1996

Firstpage :

Abstract :

We investigate the interactions of front-end feature extraction and back-end classification techniques in HMM based speech recognizer. This work concentrates on finding the optimal linear transformation of Mel-warped short-time DFT information according to the minimum classification error criterion. These transformations, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error count. The discriminatively derived state-dependent transformations on the DFT data are then combined with their first time derivatives to produce a basic feature set. Experimental results show that Mel-warped DFT features, subject to appropriate transformation in a state-dependent manner, are more effective than the Mel-frequency cepstral coefficients that have dominated current speech recognition technology. The best error rate reduction of 9% is obtained using the new model, tested on a TIMIT phone classification task, relative to conventional HMM

Keywords :

cepstral analysis; discrete Fourier transforms; feature extraction; hidden Markov models; learning (artificial intelligence); speech processing; speech recognition; DFT data; HMM based speech recognition; HMM parameters; Mel warped DFT features; Mel-frequency cepstral coefficients; TIMIT phone classification task; back-end classification techniques; error rate reduction; experimental results; feature set; first time derivatives; front-end feature extraction; gradient descent method; linear transforms; minimum classification error criterion; optimal linear transformation; state-dependent transformations; supervised learning; Automatic speech recognition; Cepstral analysis; Discrete Fourier transforms; Discrete cosine transforms; Feature extraction; Filter bank; Hidden Markov models; Mel frequency cepstral coefficient; Speech recognition; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on

Conference_Location :

Atlanta, GA

ISSN :

1520-6149

Print_ISBN :

0-7803-3192-3

Type :

conf

DOI :

10.1109/ICASSP.1996.540277

Filename :

540277

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=302071