مرکز منطقه ای اطلاع رساني علوم و فناوري - On speech features fusion, α-integration Gaussian modeling and multi-style training for noise robust speaker classification

DocumentCode :

59297

Title :

On speech features fusion, α-integration Gaussian modeling and multi-style training for noise robust speaker classification

Author :

Venturini, A. ; Zao, L. ; Coelho, Rui

Author_Institution :

Lab. of Acoust. Signal Process., Mil. Inst. of Eng. (IME), Rio de Janeiro, Brazil

Volume :

Issue :

fYear :

2014

fDate :

Dec. 2014

Firstpage :

1951

Lastpage :

1964

Abstract :

This paper investigates the fusion of Mel-frequency cepstral coefficients (MFCC) and statistical pH features to improve the performance of speaker verification (SV) in non-stationary noise conditions. The α-integrated Gaussian Mixture Model ( α-GMM) classifier is adopted for speaker modeling. Two different approaches are applied to reduce the effects of noise corruption in the SV task: speech enhancement and multi-style training (MT). The spectral subtraction with minimum statistics (MS/SS) and the optimally-modified log-spectral amplitude with improved minima controlled recursive averaging (IMCRA/OMLSA) are examined for the speech enhancement procedure. The MT techniques are based on colored (Colored-MT), white (White-MT) and narrow-band (Narrow-MT) noises. Six real non-stationary noises, collected from different acoustic sources, are used to corrupt the TIMIT speech database in four different signal-to-noise ratios (SNR). The index of non-stationarity (INS) is chosen for the stationarity tests of the acoustic noises. Complementary SV experiments are conducted in realistic noisy conditions using the MIT database. The results show that the best SV accuracy was obtained with the MFCC + pH features fusion, the MS/SS and the Colored-MT.

Keywords :

Gaussian processes; mixture models; signal classification; speech enhancement; statistical analysis; α-integrated Gaussian mixture model classifier; α-integration Gaussian modeling; IMCRA; MFCC; OMLSA; colored-MT; improved minima controlled recursive averaging; mel-frequency cepstral coefficients; minimum statistics; multistyle training; narrow-MT; narrow-band noises; noise robust speaker classification; nonstationary noise conditions; optimally-modified log-spectral amplitude; signal-to-noise ratios; speaker verification; spectral subtraction; speech enhancement; speech features fusion; statistical pH features; white-MT; IEEE transactions; Mel frequency cepstral coefficient; Noise; Speech; Speech enhancement; Training; $alpha $-GMM; Features fusion; Hurst exponent; multi-style training; non-stationary acoustic noise; speaker verification; speech enhancement;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2014.2355821

Filename :

6894135

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=59297