مرکز منطقه ای اطلاع رساني علوم و فناوري - Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals

DocumentCode :

1414275

Title :

Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals

Author :

Durrieu, Jean-Louis ; Richard, Gaël ; David, Bertrand ; Févotte, Cédric

Author_Institution :

Inst. TELECOM, TELECOM ParisTech, Paris, France

Volume :

Issue :

fYear :

2010

fDate :

3/1/2010 12:00:00 AM

Firstpage :

564

Lastpage :

575

Abstract :

Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximum-likelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets.

Keywords :

Gaussian processes; audio signal processing; filtering theory; maximum likelihood estimation; source separation; Gaussian scaled mixture model; extended instantaneous mixture model; maximum-likelihood framework; polyphonic audio signals; polyphonic music recording; signal model; single-channel source separation; source/filter model; statistical model; unsupervised main melody extraction; Audio recording; Data mining; Filters; Frequency estimation; Humans; Maximum likelihood estimation; Parameter estimation; Performance evaluation; Source separation; Testing; Blind audio source separation; Expectation–Maximization (EM) algorithm; Gaussian scaled mixture model (GSMM); main melody extraction; maximum likelihood; music; non-negative matrix factorization (NMF); source/filter model; spectral analysis;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2010.2041114

Filename :

5410055

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1414275