• DocumentCode
    1049306
  • Title

    A Generalized Time–Frequency Subtraction Method for Robust Speech Enhancement Based on Wavelet Filter Banks Modeling of Human Auditory System

  • Author

    Shao, Yu ; Chang, Chip-Hong

  • Author_Institution
    Nanyang Technol. Univ.
  • Volume
    37
  • Issue
    4
  • fYear
    2007
  • Firstpage
    877
  • Lastpage
    889
  • Abstract
    We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.
  • Keywords
    acoustic signal processing; channel bank filters; signal denoising; signal representation; speech enhancement; speech intelligibility; wavelet transforms; bark spreading energy; generalized perceptual wavelet denoising method; generalized time-frequency subtraction algorithm; human auditory system; perceptual wavelet packet transform; psychoacoustic model; quality noise reduction algorithms; residual noise; signal-to-noise ratio; simultaneous masking; speech intelligibility; temporal masking; temporal spreading energy; unvoiced speech enhancement algorithm; wavelet coefficients; wavelet filter banks modeling; wavelet multirate signal representation; Auditory system; Filter bank; Humans; Noise reduction; Noise robustness; Psychoacoustic models; Signal to noise ratio; Speech enhancement; Time frequency analysis; Working environment noise; Auditory masking; noise reduction; speech enhancement; wavelet; Algorithms; Artificial Intelligence; Auditory Perception; Biomimetics; Computer Simulation; Humans; Models, Biological; Pattern Recognition, Automated; Signal Processing, Computer-Assisted; Sound Spectrography; Speech Recognition Software;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4419
  • Type

    jour

  • DOI
    10.1109/TSMCB.2007.895365
  • Filename
    4267880