DocumentCode :
44628
Title :
Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique
Author :
Huang, Feng ; Tan Lee
Author_Institution :
Dept. of Electron. Eng., Chinese Univ. of Hong Kong, Hong Kong, China
Volume :
21
Issue :
1
fYear :
2013
fDate :
Jan. 2013
Firstpage :
99
Lastpage :
109
Abstract :
Pitch estimation from acoustic signals is a fundamental problem in many areas of speech research. For noise-corrupted speech, reliable pitch estimation is difficult. This paper presents a study of pitch estimation in noisy speech based on robust temporal-spectral representation and sparse reconstruction. We propose to accumulate spectral peaks over consecutive time frames. Since harmonic structure of speech changes much more slowly than noise spectrum, spectral peaks related to pitch harmonics would stand out over the noise through the accumulation. Experimental results show that the accumulated peak spectrum is indeed a robust representation of pitch harmonics. Subsequently, the accumulated peak spectrum is expressed as a sparse linear combination of a large set of clean peak spectrum exemplars. Gaussian mixture density is used to model noise spectrum peaks. The weights of the linear combination are estimated so as to maximize the likelihood of the accumulated peak spectrum under sparsity constraint. Robust pitch estimation is done based on the sparse weights and the corresponding peak spectrum exemplars. The use of Gaussian mixture model leads to non-convexity of the objective function for sparse weight estimation. By approximation and reformulation, two convex optimization approaches are developed to estimate the weights. Extensive experimental studies are carried out to evaluate performance of the proposed pitch estimation algorithms on a wide variety of noise conditions. It is clearly shown that the proposed methods significantly and consistently outperform the conventional methods, particularly at very low signal-to-noise ratios (e.g., SNR <; -5 dB).
Keywords :
Gaussian processes; acoustic signal processing; convex programming; estimation theory; performance evaluation; speech processing; Gaussian mixture density; Gaussian mixture model; accumulated peak spectrum; acoustic signals; clean peak spectrum exemplars; consecutive time frames; corresponding peak spectrum exemplars; harmonic structure; noise spectrum peaks; noise-corrupted speech; noisy speech; nonconvexity; objective function; pitch harmonics; reliable pitch estimation; robust pitch estimation; robust representation; robust temporal-spectral representation; sparse estimation technique; sparse linear combination; sparse reconstruction; sparse weight estimation; sparse weights; sparsity constraint; spectral peaks; speech research; Estimation; Frequency estimation; Harmonic analysis; Noise; Robustness; Speech; Speech processing; $l_{1}$ regularization; pitch estimation; sparse reconstruction; temporally accumulated peak spectrum;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2215589
Filename :
6307828
Link To Document :
بازگشت