Abstract :
In this study, single-channel speech source separation is carried out to separate the speech from the background music, which degrades the speech recognition performance especially in broadcast news transcription systems. In the proposed method, assuming that we know a catalog of the background music, we developed a generative model for the superposed speech and music spectrograms. We represent the speech spectrogram by a Non-negative Matrix Factorization (NMF) model and the music spectrogram by a conditional Mixture Model. In this model, we assume that the background music is generated by repeating and changing the gain of the jingle in the music catalog. We compare the performance of our system with the performance of the traditional NMF model.We address the gain estimation problem of the catalog-based method. In this study, we showed that traditional NMF method outperforms the catalogbased method. However, using Gamma Markov Chain (GMC) in the gain estimation improves the separation performance and yields better separation compared to NMF model.