• DocumentCode
    712894
  • Title

    Speech/music separation using non-negative matrix factorization with combination of cost functions

  • Author

    Nasersharif, Babak ; Abdali, Sara

  • Author_Institution
    Fac. of Comput. Eng., K.N. Toosi Univ. of Technol., Tehran, Iran
  • fYear
    2015
  • fDate
    3-5 March 2015
  • Firstpage
    107
  • Lastpage
    111
  • Abstract
    A solution for separating speech from music signal as a single channel source separation is Non-negative Matrix Factorization (NMF). In this approach spectrogram of each source signal is factorized as multiplication of two matrices which are known as basis and weight matrices. To achieve proper estimation of signal spectrogram, weight and basis matrices are updated iteratively. To estimate distance between signal and its estimation a cost function is used usually. Different cost functions have been introduced based on Kullback-Leibler (KL) and Itakura-Saito (IS) divergences. IS divergence is scale-invariant and so it is suitable for the conditions in which the coefficients of signal have a large dynamic range, for example in music short-term spectra. Based on this IS property, in this paper, we propose to use IS divergence as cost function of NMF in the training stage for music and on the other hand we suggest to use KL divergence as NMF cost function in the training stage for speech. Moreover, in the decomposition stage, we propose to use a linear combination of these two divergences in addition to a regularization term which considers temporal continuity information as a prior knowledge. Experimental results on one hour of speech and music, shows a good trade-off between signal to inference ratio (SIR) of speech and music in comparison to conventional NMF methods.
  • Keywords
    matrix decomposition; music; source separation; speech processing; IS divergences; IS property; Itakura-Saito divergences; KL divergences; Kullback-Leibler divergences; NMF; SIR; basis matrices; cost functions; decomposition stage; music short-term spectra; music signal; nonnegative matrix factorization; signal coefficients; signal distance estimation; signal spectrogram estimation; signal to inference ratio; single channel source separation; source signal; speech/music separation; weight matrices; Cost function; Matrix decomposition; Multiple signal classification; Source separation; Speech; Standards; Training; Itakura-Saito divergence; Kullback-Leibler divergence; Music; Non-negative Matrix Factorization (NMF); Single Channel Source Separation; Speech;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Artificial Intelligence and Signal Processing (AISP), 2015 International Symposium on
  • Conference_Location
    Mashhad
  • Print_ISBN
    978-1-4799-8817-4
  • Type

    conf

  • DOI
    10.1109/AISP.2015.7123491
  • Filename
    7123491