• DocumentCode
    29491
  • Title

    Audio-visual underdetermined blind source separation algorithm based on Gaussian potential function

  • Author

    Zhang Ye ; Cao Kang ; Wu Kangrui ; Yu Tenglong ; Zhou Nanrun

  • Author_Institution
    Dept. of Electron. Inf. Eng., Nanchang Univ., Nanchang, China
  • Volume
    11
  • Issue
    6
  • fYear
    2014
  • fDate
    Jun-14
  • Firstpage
    71
  • Lastpage
    80
  • Abstract
    Most existing algorithms for the underdetermined blind source separation (UBSS) problem are two-stage algorithm, i.e., mixing parameters estimation and sources estimation. In the mixing parameters estimation, the previously proposed traditional clustering algorithms are sensitive to the initializations of the mixing parameters. To reduce the sensitiveness to the initialization, we propose a new algorithm for the UBSS problem based on anechoic speech mixtures by employing the visual information, i.e., the interaural time difference (ITD) and the interaural level difference (ILD), as the initializations of the mixing parameters. In our algorithm, the video signals are utilized to estimate the distances between microphones and sources, and then the estimations of the ITD and ILD can be obtained. With the sparsity assumption in the time-frequency domain, the Gaussian potential function algorithm is utilized to estimate the mixing parameters by using the ITDs and ILDs as the initializations of the mixing parameters. And the time-frequency masking is used to recover the sources by evaluating the various ITDs and ILDs. Experimental results demonstrate the competitive performance of the proposed algorithm compared with the baseline algorithms.
  • Keywords
    Gaussian processes; audio-visual systems; blind source separation; estimation theory; microphones; speech processing; time-frequency analysis; video signal processing; Gaussian potential function algorithm; ILD estimation; ITD estimation; anechoic speech mixture; audio-visual UBSS algorithm; distance estimation; interaural level difference; interaural time difference; microphones; mixing parameter estimation; source estimation; sparsity assumption; time-frequency masking; underdetermined blind separation algorithm; video signal processing; visual information; Algorithm design and analysis; Audio-visual systems; Clustering algorithms; Hidden Markov models; Signal processing algorithms; Visualization; Gaussian potential function; interaural level difference; interaural time difference; underdetermined blind source separation; visual information;
  • fLanguage
    English
  • Journal_Title
    Communications, China
  • Publisher
    ieee
  • ISSN
    1673-5447
  • Type

    jour

  • DOI
    10.1109/CC.2014.6879005
  • Filename
    6879005