مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system

DocumentCode :

1187242

Title :

Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system

Author :

Pujol, Pere ; Pol, Susagna ; Nadeu, Climent ; Hagen, Astrid ; Bourlard, Hervé

Author_Institution :

Talp Res. Center, Univ. Politecnica de Catalunya, Barcelona, Spain

Volume :

Issue :

fYear :

2005

Firstpage :

Lastpage :

Abstract :

Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency derivatives of the logFBEs, lie in the frequency domain, and have shown good recognition performance with respect to the conventional mel-frequency cepstral coefficients (MFCCs) for hidden Markov models (HMM) based systems. In this paper, the FF features are first compared with the MFCCs and the relative spectral perceptual linear prediction (Rasta-PLP) features using both a hybrid HMM/MLP and a usual HMM/Gaussian mixture models (HMM/GMM) based recognition system, for both clean and noisy speech. Taking advantage of the ability of the hybrid system to deal with correlated features, the inclusion of both the frequency second-derivatives and the raw logFBEs as additional features is proposed and tested. Moreover, the robustness of these features in noisy conditions is enhanced by combining the FF technique with the Rasta temporal filtering approach. Finally, a study of the FF features in the framework of multistream processing is presented. The best recognition results for both clean and noisy speech are obtained from the multistream combination of the J-Rasta-PLP features and the FF features.

Keywords :

cepstral analysis; filtering theory; hidden Markov models; multilayer perceptrons; speech recognition; Gaussian mixture modeling; HMM/GMM speech recognition system; correlated feature; frequency domain; frequency filtering; frequency-second derivative; hidden Markov model; hybrid HMM/MLP; logarithmic filter-bank energy; mel-frequency cepstral coefficient; multilayer perceptron; multistream processing; noisy speech; relative spectral perceptual linear prediction; temporal filtering; Cepstral analysis; Filtering; Frequency domain analysis; Hidden Markov models; Mel frequency cepstral coefficient; Nonlinear filters; Predictive models; Robustness; Speech recognition; System testing;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/TSA.2004.834466

Filename :

1369308

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1187242