Acoustic model combination to compensate for residual noise in multi-channel source separation

Author

Yoon, Jae Sam ; Park, Ji Hun ; Kim, Hong Kook

Author_Institution

Dept. of Inf. & Commun., Gwangju Inst. of Sci. & Technol. (GIST), Gwangju

fYear

2009

fDate

19-24 April 2009

Firstpage

3925

Lastpage

3928

Abstract

In this paper, we propose an acoustic model combination technique for reducing a mismatch in a multi-channel noisy environment. To this end, we first apply a mask-based multi-channel source separation method, typically computational auditory scene analysis (CASA), to separate the speech source from noise. However, a certain degree of noise remains in the separated speech source, especially under low signal-to-noise ratio (SNR) conditions since the estimated mask is not ideal. Thus, the performance of automatic speech recognition (ASR) is limited. To improve ASR performance, the remaining noise can be further compensated in the acoustic model domain under a framework of parallel model combination. In particular, a noise model for PMC is estimated from the noise remained after application of the mask-based source separation, and SNR for PMC is also estimated based on the average of relative magnitude of mask along the utterance. It is shown from the experiments that the proposed acoustic model combination method relatively reduces the word error rate by 52.14% compared to mask-based source separation alone.

Keywords

acoustic signal processing; source separation; speech recognition; ASR performance; PMC approach; acoustic model combination method; automatic speech recognition; computational auditory scene analysis; mask-based multichannel source separation; parallel model combination; residual noise; Acoustic noise; Automatic speech recognition; Image analysis; Noise reduction; Signal to noise ratio; Source separation; Speech analysis; Speech coding; Speech enhancement; Working environment noise; Speech recognition; computational auditory scene analysis; mask-based SNR estimation; mask-based noise model estimation; multi-channel source separation; parallel model combination;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on

Conference_Location

Taipei

ISSN

1520-6149

Print_ISBN

978-1-4244-2353-8

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2009.4960486

Filename

4960486