DocumentCode :
417128
Title :
Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition
Author :
Ishizuka, Kentaro ; Miyazaki, Noboru
Author_Institution :
NTT Commun. Sci. Labs., NTT Corp., Tokyo, Japan
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development of this feature extraction method was motivated by findings in speech perception research. With this method, the speech sound is filtered by Gammatone filter banks, and then the output of each filter is comb filtered. Individual comb filters designed for each output signal of the Gammatone filter are used to divide the output of each filter into its periodic and aperiodic features in the sub band. The power suppressed by comb filtering is considered to be a periodic feature, whereas the power of the residue after comb filtering is considered to be an aperiodic feature. This method uses both features as the feature parameters for automatic speech recognition. A preliminary experiment using a five vowel recognition task designed to compare the proposed approach with the conventional MFCC-based feature extraction method shows that the proposed method improves vowel recognition rates by as much as 14.7 % in the presence of pink noise or a harmonic complex tone interferer. An evaluation experiment undertaken using the Aurora-2J database (Japanese noisy digit recognition database) to compare the proposed approach with the MFCC-based conventional (baseline) feature extraction method shows that the proposed method reduces the word error rate by as much as 59.62 %, with an average value of 18.21 %.
Keywords :
channel bank filters; comb filters; error statistics; feature extraction; speech recognition; Aurora-2J database; Gammatone filter banks; Japanese noisy digit recognition database; aperiodicity; automatic speech recognition; comb filters; periodicity; robust speech recognition; speech feature extraction; sub bands; vowel recognition rates; word error rate; 1f noise; Automatic speech recognition; Feature extraction; Filter bank; Filtering; Power harmonic filters; Robustness; Signal design; Spatial databases; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1325942
Filename :
1325942
Link To Document :
بازگشت