DocumentCode
1359614
Title
New Results on Single-Channel Speech Separation Using Sinusoidal Modeling
Author
Mowlaee, Pejman ; Christensen, Mads Græsbøll ; Jensen, Søren Holdt
Author_Institution
Dept. of Electron. Syst., Aalborg Univ., Aalborg, Denmark
Volume
19
Issue
5
fYear
2011
fDate
7/1/2011 12:00:00 AM
Firstpage
1265
Lastpage
1277
Abstract
We present new results on single-channel speech separation and suggest a new separation approach to improve the speech quality of separated signals from an observed mixture. The key idea is to derive a mixture estimator based on sinusoidal parameters. The proposed estimator is aimed at finding sinusoidal parameters in the form of codevectors from vector quantization (VQ) codebooks pre-trained for speakers that, when combined, best fit the observed mixed signal. The selected codevectors are then used to reconstruct the recovered signals for the speakers in the mixture. Compared to the log-max mixture estimator used in binary masks and the Wiener filtering approach, it is observed that the proposed method achieves an acceptable perceptual speech quality with less cross-talk at different signal-to-signal ratios. Moreover, the method is independent of pitch estimates and reduces the computational complexity of the separation by replacing the short-time Fourier transform (STFT) feature vectors of high dimensionality with sinusoidal feature vectors. We report separation results for the proposed method and compare them with respect to other benchmark methods. The improvements made by applying the proposed method over other methods are confirmed by employing perceptual evaluation of speech quality (PESQ) as an objective measure and a MUSHRA listening test as a subjective evaluation for both speaker-dependent and gender-dependent scenarios.
Keywords
Fourier transforms; Wiener filters; blind source separation; signal reconstruction; speaker recognition; speech coding; vector quantisation; MUSHRA listening test; VQ codebook; Wiener filtering; codevector; computational complexity; cross-talk; gender-dependent scenario; log-max mixture estimator; perceptual evaluation of speech quality; pitch estimate; recovered signal reconstruction; separated signal; short-time Fourier transform; signal-to-signal ratio; single-channel speech separation; sinusoidal feature vector; sinusoidal modeling; sinusoidal parameter; speaker-dependent scenario; vector quantization; Estimation error; Harmonic analysis; Hidden Markov models; Minimization; Speech; Speech enhancement; Mask methods; mixture estimation; single-channel speech separation (SCSS); sinusoidal modeling; speaker codebook;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2010.2089520
Filename
5608497
Link To Document