DocumentCode
3484440
Title
Linear versus mel frequency cepstral coefficients for speaker recognition
Author
Zhou, Xinhui ; Garcia-Romero, Daniel ; Duraiswami, Ramani ; Espy-Wilson, Carol ; Shamma, Shihab
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Maryland, College Park, MD, USA
fYear
2011
fDate
11-15 Dec. 2011
Firstpage
559
Lastpage
564
Abstract
Mel-frequency cepstral coefficients (MFCC) have been dominantly used in speaker recognition as well as in speech recognition. However, based on theories in speech production, some speaker characteristics associated with the structure of the vocal tract, particularly the vocal tract length, are reflected more in the high frequency range of speech. This insight suggests that a linear scale in frequency may provide some advantages in speaker recognition over the mel scale. Based on two state-of-the-art speaker recognition back-end systems (one Joint Factor Analysis system and one Probabilistic Linear Discriminant Analysis system), this study compares the performances between MFCC and LFCC (Linear frequency cepstral coefficients) in the NIST SRE (Speaker Recognition Evaluation) 2010 extended-core task. Our results in SRE10 show that, while they are complementary to each other, LFCC consistently outperforms MFCC, mainly due to its better performance in the female trials. This can be explained by the relatively shorter vocal tract in females and the resulting higher formant frequencies in speech. LFCC benefits more in female speech by better capturing the spectral characteristics in the high frequency region. In addition, our results show some advantage of LFCC over MFCC in reverberant speech. LFCC is as robust as MFCC in the babble noise, but not in the white noise. It is concluded that LFCC should be more widely used, at least for the female trials, by the mainstream of the speaker recognition community.
Keywords
probability; speaker recognition; Mel-frequency cepstral coefficients; babble noise; joint factor analysis system; linear frequency cepstral coefficients; probabilistic linear discriminant analysis system; reverberant speech; speaker recognition back-end systems; spectral characteristics; speech production theory; speech recognition; vocal tract length; white noise; Maximum likelihood detection; Mel frequency cepstral coefficient; NIST; Nonlinear filters; Speaker recognition; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location
Waikoloa, HI
Print_ISBN
978-1-4673-0365-1
Electronic_ISBN
978-1-4673-0366-8
Type
conf
DOI
10.1109/ASRU.2011.6163888
Filename
6163888
Link To Document