Noise suppression and loudness normalization in an auditory model-based acoustic front-end

Author

Vereecken, Halewijn ; Martens, Jean-Pierre

Author_Institution

ELIS, Ghent Univ., Belgium

Volume

1

fYear

1996

fDate

3-6 Oct 1996

Firstpage

566

Abstract

It is commonly acknowledged that the presence of additive and convolutional noise and speech level variations can seriously deteriorate the performance of a speech recognizer. In the case considered an auditory model is used as the acoustic front-end, it turns out that compensation techniques such as spectral subtraction and log-spectral mean subtraction can be outperformed by time-domain techniques operating on the band-pass filtered signals which are supplied to the haircell models. In our earlier paper (1995) we showed that additive noise could be removed effectively by means of center clippers put in front of the haircell models. This technique, which was called linear noise magnitude subtraction (NMS), is further improved in this paper. The nonlinear NMS proposed here outperforms the linear one, especially for low signal-to-noise ratios. To compensate for speech level variations and convolutional noise, we have adopted the same philosophy: remove the effects before the signal is supplied to the haircell models. This is accomplished by introducing normalization gains in front of the haircell models. It is shown that this loudness mean normalization (LMN) technique when used in combination with NMS offers a highly robust speech representation

Keywords

acoustic signal processing; compensation; hearing; natural language interfaces; noise; speech recognition; additive noise; auditory model-based acoustic front-end; band-pass filtered signals; center clippers; compensation techniques; convolutional noise; haircell models; highly robust speech representation; linear noise magnitude subtraction; log-spectral mean subtraction; loudness mean normalization technique; loudness normalization; low signal-to-noise ratios; noise suppression; spectral subtraction; speech level variations; speech recognizer; time-domain techniques; Acoustic noise; Additive noise; Band pass filters; Convolution; Noise level; Robustness; Signal to noise ratio; Speech enhancement; Speech recognition; Time domain analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607180

Filename

607180