Title :
Robust speech detection and segmentation for real-time ASR applications
Author :
Shafran, Izhak ; Rose, Richard
Author_Institution :
AT&T Labs.-Res., USA
Abstract :
This paper provides a solution for robust speech detection that can be applied across a variety of tasks. The solution is based on an algorithm that performs non-parametric estimation of the background noise spectrum using minimum statistics of the smoothed short-time Fourier transform (STFT). It is shown that the new algorithm can operate effectively under varying signal-to-noise ratios. Results are reported on two tasks - HMIHY and SPINE - which differ in their speaking style, background noise type and bandwidth. With a computational cost of less than 2% real-time on a 1GHz P-3 machine and a latency of 400 ms, it is suitable for real-time ASR applications.
Keywords :
Fourier transforms; nonparametric statistics; speech recognition; ASR; HMIHY; SPINE; STFT; background noise spectrum; background noise type; bandwidth; computational cost; minimum statistics; nonparametric estimation; robust speech detection; segmentation; signal-to-noise ratios; smoothed short-time Fourier transform; speaking style; Automatic speech recognition; Background noise; Bandwidth; Computational efficiency; Delay; Noise robustness; Signal to noise ratio; Statistics; Telephony; Working environment noise;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198810