Title :
Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions
Author :
Pham, Tuan V. ; Tang, Chien T. ; Stadtschnitzer, Michael
Author_Institution :
Electron. & Telecomm. Engr. Dept., Univ. of Danang, Danang, Vietnam
Abstract :
We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.
Keywords :
learning (artificial intelligence); neural nets; signal detection; speech processing; ETSI AFE ES 202 050; ITU-T Rec G 729 Annex B; MFCC feature; SNOW corpus; TIMIT corpus; artificial neural network; feature extraction; mel-frequency cepstral coefficient; neural network classifier; neural network training; robust voice activity detection; signal-to-noise ratio; speech sample; Artificial neural networks; Cepstral analysis; Feature extraction; Neural networks; Robustness; Signal to noise ratio; Snow; Speech; Telecommunication standards; Working environment noise;
Conference_Titel :
Computing and Communication Technologies, 2009. RIVF '09. International Conference on
Conference_Location :
Da Nang
Print_ISBN :
978-1-4244-4566-0
Electronic_ISBN :
978-1-4244-4568-4
DOI :
10.1109/RIVF.2009.5174662