• DocumentCode
    4611
  • Title

    Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array

  • Author

    Weifeng Li ; Longbiao Wang ; Yicong Zhou ; Dines, John ; Magimai-Doss, Mathew ; Bourlard, Herve ; Qingmin Liao

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Shenzhen, China
  • Volume
    22
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    2244
  • Lastpage
    2255
  • Abstract
    This paper introduces a nonlinear vector-based feature mapping approach to extract robust features for automatic speech recognition (ASR) of overlapping speech using a microphone array. We explore different configurations and additional sources of information to improve the effectiveness of the feature mapping. First, we investigate the full-vector based mapping of different sources in a log mel-filterbank energy (log MFBE) domain, and demonstrate that retraining the acoustic model using the generated training data can help improve the recognition performance. Then we investigate the feature mapping between different domains. Finally in order to improve the qualities of the mapping inputs we propose a nonlinear mapping of the features from multiple beamformed sources, which are directed at the target and interfering speakers, respectively. We demonstrate the effectiveness of the proposed approach through extensive evaluations on the MONC corpus, which includes non-overlapping single speaker and overlapping multi-speaker conditions.
  • Keywords
    array signal processing; channel bank filters; feature extraction; microphone arrays; speaker recognition; vectors; ASR; MONC corpus; acoustic model; full-vector based mapping; log MFBE; log mel-filterbank energy; microphone array; multiple beamforming source; nonlinear mapping; nonlinear vector-based feature mapping approach; nonoverlapping single speaker condition; overlapping multispeaker condition; robust overlapping automatic speech recognition; speaker interference; training data generation; Arrays; Feature extraction; Microphones; Speech; Speech enhancement; Speech recognition; Vectors; Beamforming; microphone array; neural network; speech recognition; speech separation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2364130
  • Filename
    6930804