A pairwise algorithm for pitch estimation and speech separation using deep stacking network

Author

Hui Zhang ; Xueliang Zhang ; Shuai Nie ; Guanglai Gao ; Wenju Liu

Author_Institution

Comput. Sci. Dept., Inner Mongolia Univ., Hohhot, China

fYear

2015

fDate

19-24 April 2015

Firstpage

246

Lastpage

250

Abstract

Pitch information is an important cue for speech separation. However, pitch estimation in noisy condition is also a task as challenging as speech separation. In this paper, we propose a supervised learning architecture which combines these two problems concisely. The proposed algorithm is based on deep stacking network (DSN) which provides a method of stacking simple processing modules in building deep architecture. In the training stage, an ideal binary mask is used as target. The input vector includes the outputs of lower module and frame-level features which consist of spectral and pitch-based features. In the testing stage, each module provides an estimated binary mask which is employed to re-estimate pitch. Then we update the pitch-based features to the next module. This procedure is embedded iteratively in DSN, and we obtain the final separation results from the last module of DSN. Systematic evaluations show that the proposed approach produces high quality estimated binary mask and outperforms recent systems in generalization.

Keywords

learning (artificial intelligence); speech processing; binary mask; deep stacking network; pairwise algorithm; pitch estimation; speech separation; supervised learning architecture; Noise; Speech; Testing; Training; Computational auditory scene analysis; Pitch estimation; Speech separation; Supervised learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7177969

Filename

7177969