DocumentCode :
700160
Title :
Using audio-visual features for robust voice activity detection in clean and noisy speech
Author :
Almajai, Ibrahim ; Milner, Ben
Author_Institution :
Sch. of Comput. Sci., Univ. of East Anglia, Norwich, UK
fYear :
2008
fDate :
25-29 Aug. 2008
Firstpage :
1
Lastpage :
5
Abstract :
The aim of this work is to utilize both audio and visual speech information to create a robust voice activity detector (VAD) that operates in both clean and noisy speech. A statistical-based audio-only VAD is developed first using MFCC vectors as input. Secondly, a visual-only VAD is produced which uses 2-D discrete cosine transform (DCT) visual features. The two VADs are then integrated into an audio-visual VAD (AV-VAD). A weighting term is introduced to vary the contribution of the audio and visual components according to the input signal-to-noise ratio (SNR). Experimental results first establish the optimal configuration of the classifier and show that higher accuracy is obtained when temporal derivatives are included. Tests in white noise down to an SNR of -20dB show the AV-VAD to be highly robust with accuracy remaining above 97%. Comparison with the ETSI Aurora VAD shows the AV-VAD to be significantly more accurate.
Keywords :
audio-visual systems; discrete cosine transforms; signal denoising; speech processing; statistical analysis; 2D DCT visual features; 2D discrete cosine transform; AV-VAD; ETSI Aurora VAD; MFCC vector; SNR; audio speech information; audio-visual VAD; clean speech; noisy speech; robust voice activity detector; signal-to-noise ratio; statistical-based audio-only VAD; visual speech information; visual-only VAD; Accuracy; Feature extraction; Signal to noise ratio; Speech; Support vector machine classification; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing Conference, 2008 16th European
Conference_Location :
Lausanne
ISSN :
2219-5491
Type :
conf
Filename :
7080692
Link To Document :
بازگشت