DocumentCode :
1304544
Title :
Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection
Author :
Fukuda, Takashi ; Ichikawa, Osamu ; Nishimura, Masafumi
Author_Institution :
IBM Res. - Tokyo, Yamato, Japan
Volume :
4
Issue :
5
fYear :
2010
Firstpage :
834
Lastpage :
844
Abstract :
Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. This paper proposes a statistical-model-based noise-robust VAD algorithm using long-term temporal information and harmonic-structure-based features in speech. Long-term temporal information has recently become an ASR focus, but has not yet been deeply investigated for VAD. In this paper, we first consider the temporal features in a cepstral domain calculated over the average phoneme duration. In contrast, the harmonic structures are well-known bearers of acoustic information in human voices, but that information is difficult to exploit statistically. This paper further describes a new method to exploit the harmonic structure information with statistical models, providing additional noise robustness. The proposed method including both the long-term temporal and the static harmonic features led to considerable improvements under low SNR conditions, with 77.7% error reduction on average as compared with the ETSI AFE-VAD in our VAD testing. In addition, the word error rate was reduced by 29.1% in a test that included a full ASR system.
Keywords :
cepstral analysis; signal detection; speech recognition; ASR systems; ETSI AFE-VAD; automatic speech recognition systems; cepstral domain; error reduction; harmonic structure information; spectrotemporal feature; speech features; statistical-model-based noise-robust VAD algorithm; voice activity detection; Cepstrum; Feature extraction; Harmonic analysis; Signal to noise ratio; Speech; Speech recognition; Average phoneme duration; harmonic structure; long-term temporal information; voice activity detection (VAD);
fLanguage :
English
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
Publisher :
ieee
ISSN :
1932-4553
Type :
jour
DOI :
10.1109/JSTSP.2010.2069750
Filename :
5557742
Link To Document :
بازگشت