Multi-Microphone Speech Dereverberation and Noise Reduction Using Relative Early Transfer Functions

Author

Schwartz, Ofer ; Gannot, Sharon ; Habets, Emanuel A. P.

Author_Institution

Fac. of Eng., Bar-Ilan Univ., Ramat-Gan, Israel

Volume

Issue

fYear

2015

fDate

Feb. 2015

Firstpage

240

Lastpage

251

Abstract

In speech communication systems, the microphone signals are degraded by reverberation and ambient noise. The reverberant speech can be separated into two components, namely, an early speech component that includes the direct path and some early reflections, and a late reverberant component that includes all the late reflections. In this paper, a novel algorithm to simultaneously suppress early reflections, late reverberation and ambient noise is presented. A multi-microphone minimum mean square error estimator is used to obtain a spatially filtered version of the early speech component. The estimator constructed as a minimum variance distortionless response (MVDR) beamformer (BF) followed by a postfilter (PF). Three unique design features characterize the proposed method. First, the MVDR BF is implemented in a special structure, named the nonorthogonal generalized sidelobe canceller (NO-GSC). Compared with the more conventional orthogonal GSC structure, the new structure allows for a simpler implementation of the GSC blocks for various MVDR constraints. Second, In contrast to earlier works, RETFs are used in the MVDR criterion rather than either the entire RTFs or only the direct-path of the desired speech signal. An estimator of the RETFs is proposed as well. Third, the late reverberation and noise are processed by both the beamforming stage and the PF stage. Since the relative power of the noise and the late reverberation varies with the frame index, a computationally efficient method for the required matrix inversion is proposed to circumvent the cumbersome mathematical operation. The algorithm was evaluated and compared with two alternative multichannel algorithms and one single-channel algorithm using simulated data and data recorded in a room with a reverberation time of 0.5 s for various source-microphone array distances (1-4 m) and several signal-to-noise levels. The processed signals were tested using two commonly used objective measures, namely perceptual- evaluation of speech quality and log-spectral distance. As an additional objective measure, the improvement in word accuracy percentage of an acoustic speech recognition system is also demonstrated.

Keywords

array signal processing; mean square error methods; microphone arrays; reverberation; speech recognition; speech synthesis; GSC blocks; MVDR BF; acoustic speech recognition system; ambient noise; beamformer; log-spectral distance; microphone signals; minimum variance distortionless response; multimicrophone minimum mean square error estimator; multimicrophone speech dereverberation; noise reduction; nonorthogonal generalized sidelobe canceller; orthogonal GSC structure; postfilter; reflections; relative early transfer functions; reverberant speech; reverberation time; signal-to-noise levels; single-channel algorithm; source-microphone array; speech communication systems; speech component; speech quality; speech signal; Microphones; Noise; Reverberation; Speech; Speech processing; Transfer functions; Dereverberation; generalized sidelobe canceller; minimum variance distortionless response (MVDR) beamforming; multichannel Wiener filter; relative transfer function;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2014.2372335

Filename

6963314

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=49233