Title :
On generalization of classification based speech separation
Author :
Kun Han ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Monaural speech separation is a very challenging problem. Recent studies utilize supervised learning methods to estimate the ideal binary mask (IBM) to solve the problem. In a supervised learning framework, the issue of generalization to conditions different from those used in training is paramount. This paper describes methods that require only a small training corpus but can generalize to unseen conditions. The system utilizes support vector machines to learn classification cues and then employs a rethresholding method to estimate the IBM. A distribution fitting method is used to address unseen signal-to-noise ratio conditions and an iterative voice activity detection is used to address unseen noise conditions. Systematic evaluations show that the proposed approach produces high quality IBM estimates under unseen conditions.
Keywords :
learning (artificial intelligence); signal classification; signal detection; speech processing; support vector machines; IBM; classification based speech separation; classification cue; distribution fitting method; ideal binary mask; iterative voice activity detection; monaural speech separation; rethresholding method; signal-to-noise ratio condition; supervised learning; support vector machine; unseen noise condition; Histograms; Signal to noise ratio; Speech; Support vector machines; Time frequency analysis; Training; Generalization; Rethresholding; SVM; Speech separation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288928