Using binarual processing for automatic speech recognition in multi-talker scenes

Author

Spille, Constantin ; Dietz, Mathias ; Hohmann, Volker ; Meyer, Bernd T.

Author_Institution

Med. Phys., Carl-von-Ossietzky Univ. Oldenburg, Oldenburg, Germany

fYear

2013

Firstpage

7805

Lastpage

7809

Abstract

The segregation of concurrent speakers and other sound sources is an important aspect of the human auditory system but is missing in most current systems for automatic speech recognition (ASR), resulting in a large gap between human and machine performance. The present study uses a physiologically-motivated model of binaural hearing to estimate the position of moving speakers in a noisy environment by combining methods from Computational Auditory Scene Analysis (CASA) and ASR. The binaural model is paired with a particle filter and a beamformer to enhance spoken sentences that are transcribed by the ASR system. Results based on an evaluation in clean, anechoic two-speaker condition shows the word recognition rates to be increased from 30.8% to 72.6%, demonstrating the potential of the CASA-based approach. In different noisy environments, improvements were also observed for SNRs of 5 dB and above, which was attributed to the average tracking errors that were consistent over a wide range of SNRs.

Keywords

array signal processing; hearing; particle filtering (numerical methods); speaker recognition; ASR; CASA; anechoic two-speaker condition; automatic speech recognition; beamformer; binarual processing; binaural hearing; computational auditory scene analysis; concurrent speaker segregation; human auditory system; human performance; machine performance; multitalker scenes; noisy environment; particle filter; physiologically-motivated model; sound sources; word recognition rates; Direction-of-arrival estimation; Estimation; Hidden Markov models; Signal to noise ratio; Speech; Training; Automatic speech recognition; beamformer; computational auditory scene analyses; particle filter;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639183

Filename

6639183