DocumentCode :
59297
Title :
On speech features fusion, α-integration Gaussian modeling and multi-style training for noise robust speaker classification
Author :
Venturini, A. ; Zao, L. ; Coelho, Rui
Author_Institution :
Lab. of Acoust. Signal Process., Mil. Inst. of Eng. (IME), Rio de Janeiro, Brazil
Volume :
22
Issue :
12
fYear :
2014
fDate :
Dec. 2014
Firstpage :
1951
Lastpage :
1964
Abstract :
This paper investigates the fusion of Mel-frequency cepstral coefficients (MFCC) and statistical pH features to improve the performance of speaker verification (SV) in non-stationary noise conditions. The α-integrated Gaussian Mixture Model ( α-GMM) classifier is adopted for speaker modeling. Two different approaches are applied to reduce the effects of noise corruption in the SV task: speech enhancement and multi-style training (MT). The spectral subtraction with minimum statistics (MS/SS) and the optimally-modified log-spectral amplitude with improved minima controlled recursive averaging (IMCRA/OMLSA) are examined for the speech enhancement procedure. The MT techniques are based on colored (Colored-MT), white (White-MT) and narrow-band (Narrow-MT) noises. Six real non-stationary noises, collected from different acoustic sources, are used to corrupt the TIMIT speech database in four different signal-to-noise ratios (SNR). The index of non-stationarity (INS) is chosen for the stationarity tests of the acoustic noises. Complementary SV experiments are conducted in realistic noisy conditions using the MIT database. The results show that the best SV accuracy was obtained with the MFCC + pH features fusion, the MS/SS and the Colored-MT.
Keywords :
Gaussian processes; mixture models; signal classification; speech enhancement; statistical analysis; α-integrated Gaussian mixture model classifier; α-integration Gaussian modeling; IMCRA; MFCC; OMLSA; colored-MT; improved minima controlled recursive averaging; mel-frequency cepstral coefficients; minimum statistics; multistyle training; narrow-MT; narrow-band noises; noise robust speaker classification; nonstationary noise conditions; optimally-modified log-spectral amplitude; signal-to-noise ratios; speaker verification; spectral subtraction; speech enhancement; speech features fusion; statistical pH features; white-MT; IEEE transactions; Mel frequency cepstral coefficient; Noise; Speech; Speech enhancement; Training; $alpha $-GMM; Features fusion; Hurst exponent; multi-style training; non-stationary acoustic noise; speaker verification; speech enhancement;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2014.2355821
Filename :
6894135
Link To Document :
بازگشت