مرکز منطقه ای اطلاع رساني علوم و فناوري - Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms

DocumentCode :

65478

Title :

Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms

Author :

Tachibana, Hideyuki ; Ono, Nobutaka ; Sagayama, Shigeki

Author_Institution :

Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan

Volume :

Issue :

fYear :

2014

fDate :

Jan. 2014

Firstpage :

228

Lastpage :

237

Abstract :

We propose a novel singing voice enhancement technique for monaural music audio signals, which is a quite challenging problem. Many singing voice enhancement techniques have been proposed recently. However, our approach is based on a quite different idea from these existing methods. We focused on the fluctuation of a singing voice and considered to detect it by exploiting two differently resolved spectrograms, one has rich temporal resolution and poor frequency resolution, while the other has rich frequency resolution and poor temporal resolution. On such two spectrograms, the shapes of fluctuating components are quite different. Based on this idea, we propose a singing voice enhancement technique that we call two-stage harmonic/percussive sound separation (HPSS). In this paper, we describe the details of two-stage HPSS and evaluate the performance of the method. The experimental results show that SDR, a commonly-used criterion on the task, was improved by around 4 dB, which is a considerably higher level than existing methods. In addition, we also evaluated the performance of the method as a preprocessing for melody estimation in music. The experimental results show that our singing voice enhancement technique considerably improved the performance of a simple pitch estimation technique. These results prove the effectiveness of the proposed method.

Keywords :

music; speech enhancement; HPSS; monaural music audio signals; multiple resolution spectrograms; pitch estimation technique; singing voice enhancement; two-stage harmonic-percussive sound separation; Estimation; IEEE transactions; Multiple signal classification; Music; Spectrogram; Speech; Speech processing; Fluctuation; harmonic and percussive sound separation; multiple resolution; non-stationarity; pitch detection; singing voice enhancement;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2013.2287052

Filename :

6646221

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=65478