Title :
Auditory masking based acoustic front-end for robust speech recognition
Author :
Paliwal, K.K. ; Lilly, B.T.
Author_Institution :
Sch. of Microelectron. Eng., Griffith Univ., Brisbane, Qld., Australia
Abstract :
This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal. Using the properties of simultaneous masking found in the human auditory system, we compute a masking threshold as a function of frequency for a given speech frame from its power spectrum. All those portions of the power spectrum which are below the auditory threshold are not heard by the human auditory system due to masking effects and hence can be discarded. These portions are replaced by the corresponding portions in the masking threshold spectrum. This modified power spectrum is processed by the linear prediction analysis or homomorphic analysis procedure to derive cepstral features for each speech frame. We study the performance of this front-end for speech recognition under noisy environments. This front-end performs significantly better than the conventional linear prediction or homomorphic analysis based front-ends for noisy speech. In terms of signal-to-noise ratio, simultaneous masking offers an advantage of more than 5 dB over the LPCC front-end in isolated word recognition experiments and 3 dB in continuous speech recognition experiments.
Keywords :
acoustic signal processing; cepstral analysis; feature extraction; hearing; noise; satellite computers; speech processing; speech recognition; LPCC front-end; SNR; acoustic features extraction; acoustic front-end; auditory masking; cepstral features; continuous speech recognition; homomorphic analysis; human auditory system; isolated word recognition; linear prediction analysis; masking threshold; noisy environments; performance; power spectrum; signal-to-noise ratio; simultaneous masking; speech frame frequency; speech recognition; speech signal; Auditory system; Cepstral analysis; Feature extraction; Humans; Masking threshold; Robustness; Speech analysis; Speech coding; Speech recognition; Working environment noise;
Conference_Titel :
TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications., Proceedings of IEEE
Conference_Location :
Brisbane, Qld., Australia
Print_ISBN :
0-7803-4365-4
DOI :
10.1109/TENCON.1997.647283