Title :
Binaural speech separation based on the time-frequency binary mask
Author :
Mahmoodzadeh, A. ; Abutalebi, H.R. ; Soltanian-Zadeh, Hamid ; Sheikhzadeh, H.
Author_Institution :
EE Dept., Islamic Azad Univ.-Fars Sci. & Res. Branch, Shiraz, Iran
Abstract :
The perceptual ability of the human auditory system in capturing the target voice and filtering out the interferers has been remained as a great challenge. This paper proposes a binaural system for speech segregation based on spatial localization cues: Interaural Time Differences (ITD) and Interaural Intensity Differences (IID). A target speech signal is separated from interfering sounds by estimating time-frequency masks using the multi-level extension of the Otsu thresholding algorithm used in image segmentation. The ITD and IID are important features for mask estimation in low and high frequencies, respectively. A systematic evaluation in terms of Perceptual Evaluation of Speech Quality (PESQ) index shows that the resulting system yields significant improvement in performance of speech separation.
Keywords :
filtering theory; source separation; speech processing; time-frequency analysis; IID; ITD; Otsu thresholding algorithm; PESQ index; binaural speech separation; binaural system; human auditory system; image segmentation; interaural intensity differences; interaural time differences; mask estimation; multilevel extension; perceptual evaluation of speech quality index; spatial localization cues; speech segregation; target speech signal; target voice; time-frequency binary mask; Ear; Estimation; Histograms; Interference; Spectrogram; Speech; Time-frequency analysis; interaural intensity differences; interaural time differences; speech separation; time-frequency binary mask;
Conference_Titel :
Telecommunications (IST), 2012 Sixth International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4673-2072-6
DOI :
10.1109/ISTEL.2012.6483104