مرکز منطقه ای اطلاع رساني علوم و فناوري - Using audio-visual features for robust voice activity detection in clean and noisy speech

DocumentCode :

700160

Title :

Using audio-visual features for robust voice activity detection in clean and noisy speech

Author :

Almajai, Ibrahim ; Milner, Ben

Author_Institution :

Sch. of Comput. Sci., Univ. of East Anglia, Norwich, UK

fYear :

2008

fDate :

25-29 Aug. 2008

Firstpage :

Lastpage :

Abstract :

The aim of this work is to utilize both audio and visual speech information to create a robust voice activity detector (VAD) that operates in both clean and noisy speech. A statistical-based audio-only VAD is developed first using MFCC vectors as input. Secondly, a visual-only VAD is produced which uses 2-D discrete cosine transform (DCT) visual features. The two VADs are then integrated into an audio-visual VAD (AV-VAD). A weighting term is introduced to vary the contribution of the audio and visual components according to the input signal-to-noise ratio (SNR). Experimental results first establish the optimal configuration of the classifier and show that higher accuracy is obtained when temporal derivatives are included. Tests in white noise down to an SNR of -20dB show the AV-VAD to be highly robust with accuracy remaining above 97%. Comparison with the ETSI Aurora VAD shows the AV-VAD to be significantly more accurate.

Keywords :

audio-visual systems; discrete cosine transforms; signal denoising; speech processing; statistical analysis; 2D DCT visual features; 2D discrete cosine transform; AV-VAD; ETSI Aurora VAD; MFCC vector; SNR; audio speech information; audio-visual VAD; clean speech; noisy speech; robust voice activity detector; signal-to-noise ratio; statistical-based audio-only VAD; visual speech information; visual-only VAD; Accuracy; Feature extraction; Signal to noise ratio; Speech; Support vector machine classification; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Signal Processing Conference, 2008 16th European

Conference_Location :

Lausanne

ISSN :

2219-5491

Type :

conf

Filename :

7080692

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=700160