• DocumentCode
    417128
  • Title

    Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition

  • Author

    Ishizuka, Kentaro ; Miyazaki, Noboru

  • Author_Institution
    NTT Commun. Sci. Labs., NTT Corp., Tokyo, Japan
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    This paper proposes a feature extraction method that represents both the periodicity and aperiodicity of speech for robust speech recognition. The development of this feature extraction method was motivated by findings in speech perception research. With this method, the speech sound is filtered by Gammatone filter banks, and then the output of each filter is comb filtered. Individual comb filters designed for each output signal of the Gammatone filter are used to divide the output of each filter into its periodic and aperiodic features in the sub band. The power suppressed by comb filtering is considered to be a periodic feature, whereas the power of the residue after comb filtering is considered to be an aperiodic feature. This method uses both features as the feature parameters for automatic speech recognition. A preliminary experiment using a five vowel recognition task designed to compare the proposed approach with the conventional MFCC-based feature extraction method shows that the proposed method improves vowel recognition rates by as much as 14.7 % in the presence of pink noise or a harmonic complex tone interferer. An evaluation experiment undertaken using the Aurora-2J database (Japanese noisy digit recognition database) to compare the proposed approach with the MFCC-based conventional (baseline) feature extraction method shows that the proposed method reduces the word error rate by as much as 59.62 %, with an average value of 18.21 %.
  • Keywords
    channel bank filters; comb filters; error statistics; feature extraction; speech recognition; Aurora-2J database; Gammatone filter banks; Japanese noisy digit recognition database; aperiodicity; automatic speech recognition; comb filters; periodicity; robust speech recognition; speech feature extraction; sub bands; vowel recognition rates; word error rate; 1f noise; Automatic speech recognition; Feature extraction; Filter bank; Filtering; Power harmonic filters; Robustness; Signal design; Spatial databases; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1325942
  • Filename
    1325942