DocumentCode :
694549
Title :
Onset detection algorithm in voice activity detection for Mandarin
Author :
Huan Wang ; Lei Wang
Author_Institution :
Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear :
2013
fDate :
12-13 Oct. 2013
Firstpage :
1148
Lastpage :
1151
Abstract :
Voice activity detection (VAD) is one of the most challenging problems in the field of speech signal processing. The statistical model based VADs have been widely studied in the recent literatures, which usually utilize hangover algorithms to prevent clipping of weak speech tails. However, little attention has been paid on the initial consonants, and non-negligible onset detection errors might be incurred especially when the SNR is low. Since most of the Mandarin syllables start with initial consonants, an onset detection algorithm is proposed in this paper to improve the performance of VAD for Mandarin. Although consonants are mostly noise-like, they produce spectral energy distributed more towards the higher frequencies. To this characteristic, the proposed algorithm makes decision whether the weak-start detection could possibly been dampened by noise based on the posterior SNR of high frequency band, and then it makes correction correspondingly after estimating whether the week-start speech frames mistaken for nonspeech frames exist. It shows that the proposed algorithm achieves a considerable performance improvement. Furthermore, this algorithm is independent of noise type.
Keywords :
maximum likelihood estimation; natural language processing; speech recognition; Mandarin syllables; hangover algorithms; high-frequency band; initial consonants; noise type; noise-like consonants; nonnegligible onset detection errors; nonspeech frames; onset detection algorithm; performance improvement; posterior SNR; spectral energy distribution; speech signal processing; statistical model-based VAD; voice activity detection; weak-speech tail clipping prevention; weak-start detection; week-start speech frames; Detection algorithms; Hidden Markov models; Signal processing algorithms; Signal to noise ratio; Speech; Speech processing; likelihood ratio test; onset detection; voice activity detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
Conference_Location :
Dalian
Type :
conf
DOI :
10.1109/ICCSNT.2013.6967305
Filename :
6967305
Link To Document :
بازگشت