DocumentCode
1692720
Title
Using binarual processing for automatic speech recognition in multi-talker scenes
Author
Spille, Constantin ; Dietz, Mathias ; Hohmann, Volker ; Meyer, Bernd T.
Author_Institution
Med. Phys., Carl-von-Ossietzky Univ. Oldenburg, Oldenburg, Germany
fYear
2013
Firstpage
7805
Lastpage
7809
Abstract
The segregation of concurrent speakers and other sound sources is an important aspect of the human auditory system but is missing in most current systems for automatic speech recognition (ASR), resulting in a large gap between human and machine performance. The present study uses a physiologically-motivated model of binaural hearing to estimate the position of moving speakers in a noisy environment by combining methods from Computational Auditory Scene Analysis (CASA) and ASR. The binaural model is paired with a particle filter and a beamformer to enhance spoken sentences that are transcribed by the ASR system. Results based on an evaluation in clean, anechoic two-speaker condition shows the word recognition rates to be increased from 30.8% to 72.6%, demonstrating the potential of the CASA-based approach. In different noisy environments, improvements were also observed for SNRs of 5 dB and above, which was attributed to the average tracking errors that were consistent over a wide range of SNRs.
Keywords
array signal processing; hearing; particle filtering (numerical methods); speaker recognition; ASR; CASA; anechoic two-speaker condition; automatic speech recognition; beamformer; binarual processing; binaural hearing; computational auditory scene analysis; concurrent speaker segregation; human auditory system; human performance; machine performance; multitalker scenes; noisy environment; particle filter; physiologically-motivated model; sound sources; word recognition rates; Direction-of-arrival estimation; Estimation; Hidden Markov models; Signal to noise ratio; Speech; Training; Automatic speech recognition; beamformer; computational auditory scene analyses; particle filter;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6639183
Filename
6639183
Link To Document