DocumentCode :
134340
Title :
Signal to noise ratio estimation based on an optimal design of subband voice activity detection
Author :
Morita, S. ; Xugang Lu ; Unoki, Masashi
Author_Institution :
Sch. of Inf. Sci., Japan Adv. Inst. of Sci. & Technol., Ishikawa, Japan
fYear :
2014
fDate :
12-14 Sept. 2014
Firstpage :
560
Lastpage :
564
Abstract :
Estimates of the signal to noise ratio (SNR) of speech play an important role in noise reduction and predictions of speech intelligibility based on the speech transmission index (STI). Techniques of voice activity detection (VAD) must be used explicitly or implicitly during estimates of SNR to detect speech and non-speech sections. The decision of threshold in most studies has been fixed for VAD to speech and non-speech classifications during SNR estimates. We argue that fixing the decision of the threshold for all testing conditions is not optimal in controlling the false acceptance and miss detection rates of speech. We propose SNR estimates in this paper using a speech and non-speech detection algorithm based on optimizing the trade-off between false speech acceptance and miss detection rates on a receiver operating characteristic (ROC) curve. Rather than fixing the decision threshold in VAD for all SNR conditions, we optimally estimate the decision threshold using an ROC curve for each SNR condition. Thresholds are optimized in subband signals on a large training data set composed of various SNR conditions and noise types. After speech and non-speech are detected, SNR is estimated by summarizing the subband powers of speech and noise from all subbands. We applied the proposed method of estimating SNR based on AURORA2J and NOISEX-92 data corpora. The experimental results demonstrated that the proposed method was more accurate than the classical method of estimating SNR. The proposed approach could be used in robust VAD and STI estimates.
Keywords :
signal classification; signal denoising; signal detection; speech intelligibility; speech recognition; AURORA2J; NOISEX-92 data corpora; ROC curve; SNR conditions; SNR estimates; STI; VAD; decision threshold; false speech acceptance; miss detection rates; noise reduction; noise types; nonspeech classifications; nonspeech sections; optimal design; receiver operating characteristic curve; signal to noise ratio estimation; speech detection; speech intelligibility; speech transmission index; subband powers; subband signals; subband voice activity detection; Estimation; Noise measurement; Signal to noise ratio; Speech; Testing; White noise; Signal to noise ratio; decision of threshold; subband processing; voice activity detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
Type :
conf
DOI :
10.1109/ISCSLP.2014.6936717
Filename :
6936717
Link To Document :
بازگشت