DocumentCode :
107519
Title :
Neural Network Based Pitch Tracking in Very Noisy Speech
Author :
Kun Han ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Volume :
22
Issue :
12
fYear :
2014
fDate :
Dec. 2014
Firstpage :
2158
Lastpage :
2168
Abstract :
Pitch determination is a fundamental problem in speech processing, which has been studied for decades. However, it is challenging to determinate pitch in strong noise because the harmonic structure is corrupted. In this paper, we estimate pitch using supervised learning, where the probabilistic pitch states are directly learned from noisy speech data. We investigate two alternative neural networks modeling pitch state distribution given observations. The first one is a feedforward deep neural network (DNN), which is trained on static frame-level acoustic features. The second one is a recurrent deep neural network (RNN) which is trained on sequential frame-level features and capable of learning temporal dynamics. Both DNNs and RNNs produce accurate probabilistic outputs of pitch states, which are then connected into pitch contours by Viterbi decoding. Our systematic evaluation shows that the proposed pitch tracking algorithms are robust to different noise conditions and can even be applied to reverberant speech. The proposed approach also significantly outperforms other state-of-the-art pitch tracking algorithms.
Keywords :
Viterbi decoding; feedforward neural nets; learning (artificial intelligence); probability; recurrent neural nets; speech coding; DNN; RNN; Viterbi decoding; feedforward deep neural network; noisy speech data; pitch determination; pitch state distribution; pitch tracking; probabilistic pitch states; recurrent deep neural network; reverberant speech; speech processing; static frame-level acoustic features; supervised learning; temporal dynamics; Feature extraction; Hidden Markov models; Neural networks; Noise; Probabilistic logic; Speech; Training; Deep neural networks (DNNs); pitch estimation; recurrent neural networks (RNNs); supervised learning; viterbi decoding;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2014.2363410
Filename :
6923432
Link To Document :
بازگشت