DocumentCode
3430643
Title
Vocal source features for bilingual speaker identification
Author
Jianglin Wang ; Johnson, Matthew Thomas
Author_Institution
Dept. of Electr. & Comput. Eng., Marquette Univ., Milwaukee, WI, USA
fYear
2013
fDate
6-10 July 2013
Firstpage
170
Lastpage
173
Abstract
This paper introduces the use of two new features for speaker identification, Residual Phase Cepstrum Coefficients (RPCC) and Glottal Flow Cepstrum Coefficients (GLFCC), to capture speaker-specific characteristics from their vocal excitation patterns. Results on a cross-lingual speaker identification task taken from the NIST 2004 SRE demonstrate that these RPCC and GLFCC features are significantly more accurate than traditional mel-frequency cepstral coefficients (MFCC). In particular, these two new features give better results with smaller amounts of training data, due to lower model complexity.
Keywords
Gaussian processes; maximum likelihood estimation; natural language processing; speaker recognition; GLFCC; GMM-UBM; Gaussian mixture model- universal background model; NIST 2004 SRE; RPCC; bilingual speaker identification; cross-lingual speaker identification task; glottal flow cepstrum coefficients; maximum a posteriori adaptation; model complexity; residual phase cepstrum coefficients; speaker-specific characteristics; vocal excitation patterns; vocal source features; Accuracy; Adaptation models; Feature extraction; Filtering; Mel frequency cepstral coefficient; Speaker recognition; Speech; Glottal source excitation; IAIF and GMM; Speaker identification;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Information Processing (ChinaSIP), 2013 IEEE China Summit & International Conference on
Conference_Location
Beijing
Type
conf
DOI
10.1109/ChinaSIP.2013.6625321
Filename
6625321
Link To Document