Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition

Author

Ishizuka, Kentaro ; Miyazaki, Noboru

Author_Institution

NTT Commun. Sci. Labs., NTT Corp., Tokyo, Japan

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development of this feature extraction method was motivated by findings in speech perception research. With this method, the speech sound is filtered by Gammatone filter banks, and then the output of each filter is comb filtered. Individual comb filters designed for each output signal of the Gammatone filter are used to divide the output of each filter into its periodic and aperiodic features in the sub band. The power suppressed by comb filtering is considered to be a periodic feature, whereas the power of the residue after comb filtering is considered to be an aperiodic feature. This method uses both features as the feature parameters for automatic speech recognition. A preliminary experiment using a five vowel recognition task designed to compare the proposed approach with the conventional MFCC-based feature extraction method shows that the proposed method improves vowel recognition rates by as much as 14.7 % in the presence of pink noise or a harmonic complex tone interferer. An evaluation experiment undertaken using the Aurora-2J database (Japanese noisy digit recognition database) to compare the proposed approach with the MFCC-based conventional (baseline) feature extraction method shows that the proposed method reduces the word error rate by as much as 59.62 %, with an average value of 18.21 %.

Keywords

channel bank filters; comb filters; error statistics; feature extraction; speech recognition; Aurora-2J database; Gammatone filter banks; Japanese noisy digit recognition database; aperiodicity; automatic speech recognition; comb filters; periodicity; robust speech recognition; speech feature extraction; sub bands; vowel recognition rates; word error rate; 1f noise; Automatic speech recognition; Feature extraction; Filter bank; Filtering; Power harmonic filters; Robustness; Signal design; Spatial databases; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1325942

Filename

1325942