Title :
Voice activity detection using subband noncircularity
Author :
Wisdom, Scott ; Okopal, Greg ; Atlas, Les ; Pitton, James
Author_Institution :
Dept. of Electr. Eng., Univ. of Washington, Seattle, WA, USA
Abstract :
Many voice activity detection (VAD) systems use the magnitude of complex-valued spectral representations. However, using only the magnitude often does not fully characterize the statistical behavior of the complex values. We present two novel methods for performing VAD on single- and dual-channel audio that do completely account for the second-order statistical behavior of complex data. Our methods exploit the second-order noncircularity (also known as impropriety) of complex subbands of speech and noise. Since speech tends to be more improper than noise, higher impropriety suggests speech activity. Our single-channel method is blind in the sense that it is unsupervised and, unlike many VAD systems, does not rely on non-speech periods for noise parameter estimation. Our methods achieve improved performance over other state-of-the-art magnitude-based VADs on the QUT-NOISE-TIMIT corpus, which indicates that impropriety is a compelling new feature for voice activity detection.
Keywords :
audio signal processing; signal detection; speech processing; statistical analysis; QUT-NOISE-TIMIT corpus; VAD systems; complex-valued spectral representation magnitude; dual-channel audio; noise parameter estimation; second-order noncircularity; second-order statistical behavior; single-channel audio; speech activity; subband noncircularity; voice activity detection systems; Frequency estimation; Noise measurement; Signal to noise ratio; Speech; Speech processing; Time-frequency analysis; Voice activity detection; complex-valued data; second-order statistics; spectral impropriety;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178823