• DocumentCode
    3485135
  • Title

    Gain estimation approaches in catalog-based single-channel speech-music separation

  • Author

    Demir, Cemil ; Cemgil, Ali Taylan ; Saraclar, Murat

  • Author_Institution
    TUBITAK-BILGEM, Kocaeli, Turkey
  • fYear
    2011
  • fDate
    11-15 Dec. 2011
  • Firstpage
    185
  • Lastpage
    190
  • Abstract
    In this study, we analyze the gain estimation problem of the catalog-based single-channel speech-music separation method, which we proposed previously. In the proposed method, assuming that we know a catalog of the background music, we developed a generative model for the superposed speech and music spectrograms. We represent the speech spectrogram by a Non-Negative Matrix Factorization (NMF) model and the music spectrogram by a conditional Poisson Mixture Model (PMM). In this model, we assume that the background music is generated by repeating and changing the gain of the jingle in the music catalog. Although the separation performance of the proposed method is satisfactory with known gain values, the performance decreases when the gain value of the jingle is unknown and has to be estimated. In this paper, we address the gain estimation problem of the catalog-based method and propose three different approaches to overcome this problem. One of these approaches is to use Gamma Markov Chain (GMC) probabilistic structure to impose the correlation between the gain parameters across the time frames. By using GMC, the gain parameter is estimated more accurately. The other approaches are maximum a posteriori (MAP) and piece-wise constant estimation (PCE) of the gain values. Although all three methods improve the separation performance as compared to the original method itself, GMC approach achieved the best performance.
  • Keywords
    Markov processes; matrix decomposition; maximum likelihood estimation; speech processing; stochastic processes; GMC probabilistic structure; Gamma Markov chain probabilistic structure; MAP; NMF model; PCE; catalog-based single-channel speech-music separation; conditional PMM; conditional Poisson mixture model; gain estimation approach; maximum a posteriori; music spectrograms; nonnegative matrix factorization model; piecewise constant estimation; superposed speech spectrograms; Catalogs; Estimation; Gain; Indexes; Multiple signal classification; Spectrogram; Speech;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
  • Conference_Location
    Waikoloa, HI
  • Print_ISBN
    978-1-4673-0365-1
  • Electronic_ISBN
    978-1-4673-0366-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2011.6163928
  • Filename
    6163928