مرکز منطقه ای اطلاع رساني علوم و فناوري - A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children´s Speech

DocumentCode :

1224483

Title :

A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children´s Speech

Author :

Umesh, S. ; Sinha, Rohit

Author_Institution :

Dept. of Electr. Eng., Indian Inst. of Technol., Kanpur

Volume :

Issue :

fYear :

2007

Firstpage :

2418

Lastpage :

2430

Abstract :

In this paper, we study the effect of filter bank smoothing on the recognition performance of children´s speech. Filter bank smoothing of spectra is done during the computation of the Mel filter bank cepstral coefficients (MFCCs). We study the effect of smoothing both for the case when there is vocal-tract length normalization (VTLN) as well as for the case when there is no VTLN. The results from our experiments indicate that unlike conventional VTLN implementation, it is better not to scale the bandwidths of the filters during VTLN - only the filter center frequencies need be scaled. Our interpretation of the above result is that while the formant center frequencies may approximately scale between speakers, the formant bandwidths do not change significantly. Therefore, the scaling of filter bandwidths by a warp-factor during conventional VTLN results in differences in spectral smoothing leading to degradation in recognition performance. Similarly, results from our experiments indicate that for telephone-based speech when there is no normalization it is better to use uniform-bandwidth filters instead of the constant- like filters that are used in the computation of conventional MFCC. Our interpretation is that with constant- filters there is excessive spectral smoothing at higher frequencies which leads to degradation in performance for children´s speech. However, the use of constant- filters during VTLN does not create any additional performance degradation. As we will show, during VTLN it is only important that the filter bandwidths are not scaled irrespective of whether we use constant- or uniform-bandwidth filters. With our proposed changes in the filter bank implementation we get comparable performance for adults and about 6% improvement for children both for the case of using VTLN as well as the for the case of not using VTLN on a telephone-based digit recognition task.

Keywords :

cepstral analysis; smoothing methods; spectral analysis; speech recognition; Mel filter bank cepstral coefficients; children speech recognition; spectral smoothing; telephone-based digit recognition task; telephone-based speech; uniform-bandwidth filter; vocal-tract length normalization; warp-factor; Automatic speech recognition; Availability; Bandwidth; Cepstral analysis; Degradation; Filter bank; Mel frequency cepstral coefficient; Smoothing methods; Speech recognition; Strontium; Children´s speech recognition; Mel filter bank; vocal-tract length normalization;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2007.906194

Filename :

4317580

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1224483