• DocumentCode
    44075
  • Title

    A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition

  • Author

    Nemala, Sridhar Krishna ; Patil, Kailash ; Elhilali, Mounya

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
  • Volume
    21
  • Issue
    2
  • fYear
    2013
  • fDate
    Feb. 2013
  • Firstpage
    416
  • Lastpage
    426
  • Abstract
    There is strong neurophysiological evidence suggesting that processing of speech signals in the brain happens along parallel paths which encode complementary information in the signal. These parallel streams are organized around a duality of slow vs. fast: Coarse signal dynamics appear to be processed separately from rapidly changing modulations both in the spectral and temporal dimensions. We adapt such duality in a multistream framework for robust speaker-independent phoneme recognition. The scheme presented here centers around a multi-path bandpass modulation analysis of speech sounds with each stream covering an entire range of temporal and spectral modulations. By performing bandpass operations along the spectral and temporal dimensions, the proposed scheme avoids the classic feature explosion problem of previous multistream approaches while maintaining the advantage of parallelism and localized feature analysis. The proposed architecture results in substantial improvements over standard and state-of-the-art feature schemes for phoneme recognition, particularly in presence of nonstationary noise, reverberation and channel distortions.
  • Keywords
    band-pass filters; modulation; reverberation; speech recognition; bandpass modulation filtering; channel distortions; complementary information; feature explosion; localized feature analysis; multipath bandpass modulation; multistream feature framework; neurophysiological evidence; nonstationary noise; reverberation; robust speech recognition; signal dynamics; speaker independent phoneme recognition; spectral modulation; temporal modulation; Frequency modulation; Spectrogram; Speech; Speech processing; Speech recognition; Time frequency analysis; Auditory cortex; automatic speech recognition (ASR); modulation; multistream; speech parameterization;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2219526
  • Filename
    6305465