Title :
Voice activity detection using harmonic frequency components in likelihood ratio test
Author :
Tan, Lee Ngee ; Borgstrom, Bengt J. ; Alwan, Abeer
Author_Institution :
Dept. of Electr. Eng., Univ. of California, Los Angeles, CA, USA
Abstract :
This paper proposes a new statistical model-based likelihood ratio test (LRT) VAD to obtain reliable speech / non-speech decisions. In the proposed method, the likelihood ratio (LR) is calculated differently for voiced frames, as opposed to unvoiced frames: only DFT bins containing harmonic spectral peaks are selected for LR computation. To evaluate the new VAD´s effectiveness in improving the noise-robustness of ASR, its decisions are applied to pre-processing techniques such as non-linear spectral subtraction, minimum mean square error short-time spectral amplitude estimator, and frame dropping. From the ASR experiments conducted on the Aurora2 database, the proposed harmonic frequency-based LRTs give better results than conventional LRT-based VADs and the standard G.729B and ETSI AMR VADs.
Keywords :
discrete Fourier transforms; maximum likelihood estimation; speech recognition; ASR; Aurora2 database; DFT bins; VAD; harmonic frequency components; harmonic spectral peaks; likelihood ratio test; standard G.729B; statistical model based likelihood ratio test; voice activity detection; Automatic speech recognition; Feature extraction; Frequency; Hidden Markov models; Light rail systems; Noise robustness; Signal to noise ratio; Speech enhancement; Telecommunication standards; Testing; Voice activity detection; harmonic frequency; robust speech recognition; statistical model;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495611