DocumentCode
417221
Title
Voice activity detection using visual information
Author
Liu, Peng ; Wang, Zuoying
Author_Institution
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
In traditional voice activity detection (VAD) approaches, some features of the audio stream, for example frame-energy features, are used for voice decision. In this paper, we present the general framework of a visual information based VAD approach in a multi-modal system. Firstly, the Gauss mixture visual models of voice and non-voice are designed, and the decision rule is discussed in detail. Subsequently, the visual feature extraction method for VAD is investigated. The best visual feature structure and the best mixture number are selected experimentally. Our experiments show that using visual information based VAD, prominent reduction in frame error rate (31.1% relatively) is achieved, and the audio-visual stream can be segmented into sentences for recognition much more precisely (98.4% relative reduction in sentence break error rate), compared to the frame-energy based approach in the clean audio case. Furthermore, the performance of visual based VAD is independent of background noise.
Keywords
Gaussian distribution; error statistics; feature extraction; speech recognition; Gauss mixture visual models; VAD; audio-visual stream segmentation; background noise independence; decision rule; frame error rate; multi-modal system; performance; sentence break error rate; speech recognition; visual feature extraction; visual information; voice activity detection; Background noise; Crosstalk; Entropy; Error analysis; Feature extraction; Gaussian processes; Lips; Pattern recognition; Streaming media; Working environment noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326059
Filename
1326059
Link To Document