• DocumentCode
    1692720
  • Title

    Using binarual processing for automatic speech recognition in multi-talker scenes

  • Author

    Spille, Constantin ; Dietz, Mathias ; Hohmann, Volker ; Meyer, Bernd T.

  • Author_Institution
    Med. Phys., Carl-von-Ossietzky Univ. Oldenburg, Oldenburg, Germany
  • fYear
    2013
  • Firstpage
    7805
  • Lastpage
    7809
  • Abstract
    The segregation of concurrent speakers and other sound sources is an important aspect of the human auditory system but is missing in most current systems for automatic speech recognition (ASR), resulting in a large gap between human and machine performance. The present study uses a physiologically-motivated model of binaural hearing to estimate the position of moving speakers in a noisy environment by combining methods from Computational Auditory Scene Analysis (CASA) and ASR. The binaural model is paired with a particle filter and a beamformer to enhance spoken sentences that are transcribed by the ASR system. Results based on an evaluation in clean, anechoic two-speaker condition shows the word recognition rates to be increased from 30.8% to 72.6%, demonstrating the potential of the CASA-based approach. In different noisy environments, improvements were also observed for SNRs of 5 dB and above, which was attributed to the average tracking errors that were consistent over a wide range of SNRs.
  • Keywords
    array signal processing; hearing; particle filtering (numerical methods); speaker recognition; ASR; CASA; anechoic two-speaker condition; automatic speech recognition; beamformer; binarual processing; binaural hearing; computational auditory scene analysis; concurrent speaker segregation; human auditory system; human performance; machine performance; multitalker scenes; noisy environment; particle filter; physiologically-motivated model; sound sources; word recognition rates; Direction-of-arrival estimation; Estimation; Hidden Markov models; Signal to noise ratio; Speech; Training; Automatic speech recognition; beamformer; computational auditory scene analyses; particle filter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639183
  • Filename
    6639183