A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition

Author

Aikawa, Kiyoaki ; Singer, Harald ; Kawahara, Hideki ; Tohkura, Yoh´ichi

Author_Institution

ATR Auditory & Visual Perception Lab., Soraku-gun, Kyoto, Japan

Volume

2

fYear

1993

fDate

27-30 April 1993

Firstpage

668

Abstract

A dynamic cepstrum parameter that incorporates the time-frequency characteristics of auditory forward masking is proposed. A masking model is derived from psychological experimental results. A novel operational method using a lifter array is derived to perform the time-frequency masking. The parameter simulates the effective input spectrum at the front-end of the auditory system and can enhance the spectral dynamics. The parameter represents both the instantaneous and transitional aspects of a spectral time series. Phoneme and continuous speech recognition experiments demonstrated that the dynamic cepstrum outperforms the conventional cepstrum individually and in various combinations with other spectral parameters. The phoneme recognition results were improved for ten male and ten female speakers. The masking lifter with a Gaussian window provided a better performance than that with a square window.<>

Keywords

array signal processing; physiological models; speech recognition; time-frequency analysis; Gaussian window; auditory forward masking; continuous speech recognition; dynamic cepstrum; lifter array; performance; phoneme recognition; spectral time series; time-frequency masking;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on

Conference_Location

Minneapolis, MN, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.1993.319399

Filename

319399