DocumentCode :
1224483
Title :
A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children´s Speech
Author :
Umesh, S. ; Sinha, Rohit
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., Kanpur
Volume :
15
Issue :
8
fYear :
2007
Firstpage :
2418
Lastpage :
2430
Abstract :
In this paper, we study the effect of filter bank smoothing on the recognition performance of children´s speech. Filter bank smoothing of spectra is done during the computation of the Mel filter bank cepstral coefficients (MFCCs). We study the effect of smoothing both for the case when there is vocal-tract length normalization (VTLN) as well as for the case when there is no VTLN. The results from our experiments indicate that unlike conventional VTLN implementation, it is better not to scale the bandwidths of the filters during VTLN - only the filter center frequencies need be scaled. Our interpretation of the above result is that while the formant center frequencies may approximately scale between speakers, the formant bandwidths do not change significantly. Therefore, the scaling of filter bandwidths by a warp-factor during conventional VTLN results in differences in spectral smoothing leading to degradation in recognition performance. Similarly, results from our experiments indicate that for telephone-based speech when there is no normalization it is better to use uniform-bandwidth filters instead of the constant- like filters that are used in the computation of conventional MFCC. Our interpretation is that with constant- filters there is excessive spectral smoothing at higher frequencies which leads to degradation in performance for children´s speech. However, the use of constant- filters during VTLN does not create any additional performance degradation. As we will show, during VTLN it is only important that the filter bandwidths are not scaled irrespective of whether we use constant- or uniform-bandwidth filters. With our proposed changes in the filter bank implementation we get comparable performance for adults and about 6% improvement for children both for the case of using VTLN as well as the for the case of not using VTLN on a telephone-based digit recognition task.
Keywords :
cepstral analysis; smoothing methods; spectral analysis; speech recognition; Mel filter bank cepstral coefficients; children speech recognition; spectral smoothing; telephone-based digit recognition task; telephone-based speech; uniform-bandwidth filter; vocal-tract length normalization; warp-factor; Automatic speech recognition; Availability; Bandwidth; Cepstral analysis; Degradation; Filter bank; Mel frequency cepstral coefficient; Smoothing methods; Speech recognition; Strontium; Children´s speech recognition; Mel filter bank; vocal-tract length normalization;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.906194
Filename :
4317580
Link To Document :
بازگشت