Title :
Multi-layer perceptron based speech activity detection for speaker verification
Author :
Ganapathy, Sriram ; Rajan, Padmanabhan ; Hermansky, Hynek
Author_Institution :
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
In this paper, we present a speech activity detection (SAD) technique for speaker verification in noisy environments. The proposed SAD is based on phoneme posteriors derived from a multi-layer perceptron (MLP). The MLP is trained using modulation spectral features, where long temporal segments of the speech signal are analyzed in critical bands. In each sub-band, temporal envelopes are derived using the autoregressive modelling technique called frequency domain linear prediction (FDLP). The robustness of the sub-band envelopes is achieved by a minimum mean square envelope estimation technique. We also experiment with MFCC features processed with cepstral mean subtraction. The speech features are input to the trained MLP to estimate phoneme posterior probabilities. For SAD, all the speech phoneme probabilities are merged to one speech class to derive speech/non-speech decisions. The proposed SAD is applied for a speaker verification task using noisy versions of NIST 2008 speaker recognition evaluation (SRE) data, where the proposed SAD provides significant improvements (relative equal error rate (EER) improvement of about 9 % in additive noise and about 19 % in reverberant conditions). Furthermore, the improvements are consistent for the two different front-ends (FDLP and MFCC) considered here.
Keywords :
autoregressive processes; cepstral analysis; error statistics; least mean squares methods; maximum likelihood estimation; multilayer perceptrons; signal detection; speaker recognition; speech processing; MFCC; MLP; SAD; autoregressive modelling technique; cepstral mean subtraction; equal error rate; frequency domain linear prediction; minimum mean square envelope estimation; modulation spectral features; multilayer perceptron; phoneme posterior probability; speaker recognition evaluation; speaker verification; speech activity detection; speech phoneme probabilities; speech signal processing; temporal envelopes; temporal segments; Acoustics; Noise; Noise measurement; Speech; Speech processing; Speech recognition; Vectors; Frequency Domain Linear Prediction (FDLP); Speaker Verification; Speech Activity Detection;
Conference_Titel :
Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on
Conference_Location :
New Paltz, NY
Print_ISBN :
978-1-4577-0692-9
Electronic_ISBN :
1931-1168
DOI :
10.1109/ASPAA.2011.6082323