DocumentCode :
2930810
Title :
Speech separation by efficient combinatorial decoding of speech mixtures
Author :
Reyes-Gomez, Manuel ; Jojic, Nebojsa
Author_Institution :
MSN Appl. Res., Redmond, WA, USA
fYear :
2009
fDate :
June 28 2009-July 3 2009
Firstpage :
498
Lastpage :
505
Abstract :
We formulate the cocktail party problem as the minimization of a symmetric posimodular function defined on fragments of the signal captured by a single microphone. This formulation allows the application of tractable combinatorial optimization techniques, and in particular the Queyranne´s algorithm, to exactly solve a problem which was previously considered exponential in the size of the signal, and was typically addressed by greedy search or posterior distribution approximations. While the main idea described in the paper may be be applicable to a variety of signal segmentation problems (e.g., image or video segmentation), we focus here on unsupervised separation of sources in mixed speech signals recorded by a single microphone. As the optimization criterion we use the likelihood under a generative model which assumes that each time-frequency bin is assigned to one of the two speakers, and that each speaker´s utterance has been generated from the same generic speech model. (This assumption has previously been motivated by the sparsity of the time-frequency representation, making it unlikely that more than one speaker would dominate any given time-frequency bin.) The partition of the time-frequency space that maximizes the likelihood under the model corresponds to the one for which the resultant decoded speech of each independent source has the highest combined likelihood. The exact search over all possible assignments of the time-frequency bins to the two speakers is performed in polynomial time. Further speedups are achievable by presegmenting the spectrogram into a large number of small segments which do not violate the deformable spectrogram model. We show that this technique leads to blind separation of mixed signals where the two speakers have identical spectral characteristics, opening up a variety of possible applications in teleconferencing and telephony.
Keywords :
blind source separation; combinatorial mathematics; decoding; speech coding; teleconferencing; telephony; Queyranne algorithm; blind separation; combinatorial optimization techniques; efficient combinatorial decoding; generative model; greedy search approximation; microphone; posterior distribution approximation; signal segmentation problems; source seperation; speaker utterance; spectrogram presegmentation; speech mixtures; speech separation; symmetric posimodular function; teleconferencing; telephony; Decoding; Deformable models; Image segmentation; Microphones; Polynomials; Spectrogram; Speech; Teleconferencing; Telephony; Time frequency analysis; blind source signal separation optimization Queyranne´s algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on
Conference_Location :
New York, NY
ISSN :
1945-7871
Print_ISBN :
978-1-4244-4290-4
Electronic_ISBN :
1945-7871
Type :
conf
DOI :
10.1109/ICME.2009.5202543
Filename :
5202543
Link To Document :
بازگشت