Title :
Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors
Author :
Zou, Y.X. ; Zheng, W.Q. ; Wei Shi ; Hong Liu
Author_Institution :
Sch. of Electron. Comput. Eng., Peking Univ., Shenzhen, China
Abstract :
Voice Activity Detection (VAD) is one of the key techniques for many speech applications. Existing VAD algorithms have shown unsatisfied performance under nonstationary noise and low Signal-to-Noise-Ratio (SNR) situations. Motivated by the fact that people is able to distinguish the speech and non-speech even in low SNR situations, this paper studies the VAD technique from the pattern recognition point of view, where the VAD essentially is formulated as a binary classification problem. Specifically, the VAD is implemented by classifying the speech signal into speech and non-speech segments. The radial basis function (RBF) based support vector machine (SVM) is employed with supervised manner, which is perfectly suitable for binary classification tasks with some training samples. Aiming at achieving improved accuracy and robustness of the VAD technique to noise, the feature selection has been conducted by introducing the class separation measure (CSM) criterion to evaluate the capability of the feature vectors extracted for classifying speech and non-speech segments. Most famous speech features have been taken into account, including Mel-frequency cepstral coefficients (MFCC), the principal component analysis of the MFCC (PCA-MFCC), linear predictive coding (LPC) and linear predictive cepstral coding (LPCC). Intensive experimental results show that the MFCC features capture the most relevant information of speech and keep good separability of classification in different noisy conditions, so do the PCA-MFCC features. Moreover, the PCA-MFCC features are more robust to the noise with less computational cost. As a result, a VAD method by using the PCA-MFCC and the RBF-SVM as the classifier has been developed, which is termed as PCA-SVM-VAD for short. The experimental results with the NOIZEUS database show that the proposed PCA-SVM-VAD method has clear improvements over other VAD methods and performs much more robust in car noisy environment at various SNRs.
Keywords :
cepstral analysis; feature selection; linear predictive coding; principal component analysis; radial basis function networks; speech recognition; support vector machines; CSM criterion; LPCC; NOIZEUS database; PCA-MFCC; PCA-SVM-VAD; RBF-SVM; SNR; VAD technique; binary classification problem; class separation measure; feature selection; high separable speech feature vector extraction; improved voice activity detection; linear predictive cepstral coding; linear predictive coding; mel-frequency cepstral coefficient; nonstationary noise; pattern recognition; principal component analysis of the MFCC; radial basis function; signal-to-noise-ratio; speech signal classification; support vector machine; Feature extraction; Mel frequency cepstral coefficient; Signal to noise ratio; Speech; Support vector machine classification; Vectors; MFCC features; feature selection; principal component analysis; support vector machine; voice activity detection;
Conference_Titel :
Digital Signal Processing (DSP), 2014 19th International Conference on
Conference_Location :
Hong Kong
DOI :
10.1109/ICDSP.2014.6900767