DocumentCode :
1596764
Title :
Pitch and MFCC dependent GMM models for speaker identification systems
Author :
Ezzaidi, Hassan ; Rouat, Jean
Author_Institution :
Quebec Univ., Chicoutimi, Que., Canada
Volume :
1
fYear :
2004
Firstpage :
43
Abstract :
Recently, we proposed an approach to speaker identification which jointly exploits vocal tract and glottis source information. The approach synchronously takes into account the correlation between the two sources of information. The proposed theoretical model, which uses a joint law, is presented. Some restrictions and simplifications are taken into account to show the significance of this approach in practical way. The fundamental frequency and MFCCs (Mel frequency cepstrum coefficients) are used to represent the information of the source and the vocal tract, respectively. The probability density of the source, in particular, was considered to obey a uniform law. Tests were carried out with only female speakers from a speech telephony database (SPIDRE) recorded from various telephone handsets. It is proposed to model the source information by a Gaussian mixture model (GMM) rather than the uniform probabilistic model. Tests were extended to all speakers of the SPIDRE database; four systems were proposed and compared. The first is a baseline system based on the MFCC and does not use any information from the source. The second examines only the voiced segments of the vocal signal. The last two relate to the suggested approaches according to the two techniques. The source information is found to follow a normal distribution in one technique and a log normal distribution in the other. With the proposed approach, the gain in performance is 10.5% for women, 7% for men and 8% for all speakers.
Keywords :
Gaussian processes; correlation methods; log normal distribution; normal distribution; speaker recognition; GMM; Gaussian mixture model; MFCC; Mel frequency cepstrum coefficients; fundamental frequency; glottis source information; lognormal distribution; normal distribution; pitch; probability density; speaker identification systems; speech telephony database; the vocal signal voiced segments; uniform probabilistic model; vocal tract information; Cepstrum; Databases; Gaussian distribution; Information resources; Log-normal distribution; Mel frequency cepstral coefficient; Speech; System testing; Telephone sets; Telephony;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2004. Canadian Conference on
ISSN :
0840-7789
Print_ISBN :
0-7803-8253-6
Type :
conf
DOI :
10.1109/CCECE.2004.1344954
Filename :
1344954
Link To Document :
بازگشت