مرکز منطقه ای اطلاع رساني علوم و فناوري - Representation Learning for Single-Channel Source Separation and Bandwidth Extension

DocumentCode :

3604732

Title :

Representation Learning for Single-Channel Source Separation and Bandwidth Extension

Author :

Zohrer, Matthias ; Peharz, Robert ; Pernkopf, Franz

Author_Institution :

Intell. Syst. Group at the Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria

Volume :

Issue :

fYear :

2015

Firstpage :

2398

Lastpage :

2409

Abstract :

In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 ^nd CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.

Keywords :

Boltzmann machines; hidden Markov models; learning (artificial intelligence); signal denoising; source separation; speaker recognition; 2^nd CHiME speech separation challenge; ABE; GSN; PESQ; SCSS; SPN; artificial bandwidth extension; deep representation learning; frame-wise GSN; frequency-domain segmental SNR; generative models; generative stochastic networks; hidden Markov models; higher order contractive autoencoders; ill-posed prior knowledge; matched noise condition; missing frequency band reconstruction; model-based single-channel source separation; overall perceptual score; restricted Boltzmann machines; source-specific prior knowledge; speaker dependent; speaker independent; sumproduct networks; unmatched noise condition task; Adaptation models; Bandwidth; Data models; Hidden Markov models; Learning systems; Neural networks; Spectrogram; Speech processing; Stochastic processes; Bandwidth extension; deep neural networks (DNNs); generative stochastic networks; representation learning; single-channel source separation (SCSS); sum-product networks;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2015.2470560

Filename :

7210172

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3604732