DocumentCode
2023267
Title
A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition
Author
Aikawa, Kiyoaki ; Singer, Harald ; Kawahara, Hideki ; Tohkura, Yoh´ichi
Author_Institution
ATR Auditory & Visual Perception Lab., Soraku-gun, Kyoto, Japan
Volume
2
fYear
1993
fDate
27-30 April 1993
Firstpage
668
Abstract
A dynamic cepstrum parameter that incorporates the time-frequency characteristics of auditory forward masking is proposed. A masking model is derived from psychological experimental results. A novel operational method using a lifter array is derived to perform the time-frequency masking. The parameter simulates the effective input spectrum at the front-end of the auditory system and can enhance the spectral dynamics. The parameter represents both the instantaneous and transitional aspects of a spectral time series. Phoneme and continuous speech recognition experiments demonstrated that the dynamic cepstrum outperforms the conventional cepstrum individually and in various combinations with other spectral parameters. The phoneme recognition results were improved for ten male and ten female speakers. The masking lifter with a Gaussian window provided a better performance than that with a square window.<>
Keywords
array signal processing; physiological models; speech recognition; time-frequency analysis; Gaussian window; auditory forward masking; continuous speech recognition; dynamic cepstrum; lifter array; performance; phoneme recognition; spectral time series; time-frequency masking;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
Conference_Location
Minneapolis, MN, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.1993.319399
Filename
319399
Link To Document