Title :
On speech features fusion, α-integration Gaussian modeling and multi-style training for noise robust speaker classification
Author :
Venturini, A. ; Zao, L. ; Coelho, Rui
Author_Institution :
Lab. of Acoust. Signal Process., Mil. Inst. of Eng. (IME), Rio de Janeiro, Brazil
Abstract :
This paper investigates the fusion of Mel-frequency cepstral coefficients (MFCC) and statistical pH features to improve the performance of speaker verification (SV) in non-stationary noise conditions. The α-integrated Gaussian Mixture Model ( α-GMM) classifier is adopted for speaker modeling. Two different approaches are applied to reduce the effects of noise corruption in the SV task: speech enhancement and multi-style training (MT). The spectral subtraction with minimum statistics (MS/SS) and the optimally-modified log-spectral amplitude with improved minima controlled recursive averaging (IMCRA/OMLSA) are examined for the speech enhancement procedure. The MT techniques are based on colored (Colored-MT), white (White-MT) and narrow-band (Narrow-MT) noises. Six real non-stationary noises, collected from different acoustic sources, are used to corrupt the TIMIT speech database in four different signal-to-noise ratios (SNR). The index of non-stationarity (INS) is chosen for the stationarity tests of the acoustic noises. Complementary SV experiments are conducted in realistic noisy conditions using the MIT database. The results show that the best SV accuracy was obtained with the MFCC + pH features fusion, the MS/SS and the Colored-MT.
Keywords :
Gaussian processes; mixture models; signal classification; speech enhancement; statistical analysis; α-integrated Gaussian mixture model classifier; α-integration Gaussian modeling; IMCRA; MFCC; OMLSA; colored-MT; improved minima controlled recursive averaging; mel-frequency cepstral coefficients; minimum statistics; multistyle training; narrow-MT; narrow-band noises; noise robust speaker classification; nonstationary noise conditions; optimally-modified log-spectral amplitude; signal-to-noise ratios; speaker verification; spectral subtraction; speech enhancement; speech features fusion; statistical pH features; white-MT; IEEE transactions; Mel frequency cepstral coefficient; Noise; Speech; Speech enhancement; Training; $alpha $-GMM; Features fusion; Hurst exponent; multi-style training; non-stationary acoustic noise; speaker verification; speech enhancement;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2355821