مرکز منطقه ای اطلاع رساني علوم و فناوري - Auditory masking based acoustic front-end for robust speech recognition

DocumentCode :

319597

Title :

Auditory masking based acoustic front-end for robust speech recognition

Author :

Paliwal, K.K. ; Lilly, B.T.

Author_Institution :

Sch. of Microelectron. Eng., Griffith Univ., Brisbane, Qld., Australia

Volume :

fYear :

1997

fDate :

4-4 Dec. 1997

Firstpage :

165

Abstract :

This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal. Using the properties of simultaneous masking found in the human auditory system, we compute a masking threshold as a function of frequency for a given speech frame from its power spectrum. All those portions of the power spectrum which are below the auditory threshold are not heard by the human auditory system due to masking effects and hence can be discarded. These portions are replaced by the corresponding portions in the masking threshold spectrum. This modified power spectrum is processed by the linear prediction analysis or homomorphic analysis procedure to derive cepstral features for each speech frame. We study the performance of this front-end for speech recognition under noisy environments. This front-end performs significantly better than the conventional linear prediction or homomorphic analysis based front-ends for noisy speech. In terms of signal-to-noise ratio, simultaneous masking offers an advantage of more than 5 dB over the LPCC front-end in isolated word recognition experiments and 3 dB in continuous speech recognition experiments.

Keywords :

acoustic signal processing; cepstral analysis; feature extraction; hearing; noise; satellite computers; speech processing; speech recognition; LPCC front-end; SNR; acoustic features extraction; acoustic front-end; auditory masking; cepstral features; continuous speech recognition; homomorphic analysis; human auditory system; isolated word recognition; linear prediction analysis; masking threshold; noisy environments; performance; power spectrum; signal-to-noise ratio; simultaneous masking; speech frame frequency; speech recognition; speech signal; Auditory system; Cepstral analysis; Feature extraction; Humans; Masking threshold; Robustness; Speech analysis; Speech coding; Speech recognition; Working environment noise;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications., Proceedings of IEEE

Conference_Location :

Brisbane, Qld., Australia

Print_ISBN :

0-7803-4365-4

Type :

conf

DOI :

10.1109/TENCON.1997.647283

Filename :

647283

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=319597