مرکز منطقه ای اطلاع رساني علوم و فناوري - Mel-scaled discrete wavelet coefficients for speech recognition

DocumentCode :

2321165

Title :

Mel-scaled discrete wavelet coefficients for speech recognition

Author :

Gowdy, J.N. ; Tufekci, Z.

Author_Institution :

Dept. of Electr. & Comput. Eng., Clemson Univ., SC, USA

Volume :

fYear :

2000

fDate :

2000

Firstpage :

1351

Abstract :

In this paper we propose a new feature vector consisting of mel-frequency discrete wavelet coefficients (MFDWC). The MFDWC are obtained by applying the discrete wavelet transform (DWT) to the mel-scaled log filterbank energies of a speech frame. The purpose of using the DWT is to benefit from its localization property in the time and frequency domains. MFDWC are similar to subband-based (SUB) features and multi-resolution (MULT) features in that both attempt to achieve good time and frequency localization. However, MFDWC have better time/frequency localization than SUB features and MULT features. We evaluated the performance of new features for clean speech and noisy speech and compared the performance of MFDWC with mel-frequency cepstral coefficients (MFCC), SUB features and MULT features. Experimental results on a phoneme recognition task showed that a MFDWC-based recognizer gave better results than recognizers based on MFCC, SUB features, and MULT features for the white gaussian noise, band-limited white gaussian noise and clean speech cases

Keywords :

Gaussian noise; channel bank filters; discrete wavelet transforms; speech recognition; time-frequency analysis; Mel-scaled discrete wavelet coefficients; band-limited white gaussian noise; clean speech; discrete wavelet transform; feature vector; frequency domain; mel-frequency cepstral coefficients; mel-frequency discrete wavelet coefficients; mel-scaled log filterbank energies; noisy speech; performance; phoneme recognition task; speech frame; speech recognition; time domain; time/frequency localization; white gaussian noise; Cepstral analysis; Discrete wavelet transforms; Filter bank; Frequency domain analysis; Gaussian noise; Mel frequency cepstral coefficient; Speech analysis; Speech enhancement; Speech recognition; Wavelet coefficients;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on

Conference_Location :

Istanbul

ISSN :

1520-6149

Print_ISBN :

0-7803-6293-4

Type :

conf

DOI :

10.1109/ICASSP.2000.861829

Filename :

861829

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2321165