Auditory model representation for speaker recognition

Author

Colombi, John ; Anderson, Timothy R. ; Rogers, Steven K. ; Ruck, Dennis W. ; Warhola, G.T.

Author_Institution

AFIT/EN, Wright-Patterson AFB, OH, USA

Volume

2

fYear

1993

fDate

27-30 April 1993

Firstpage

700

Abstract

An examination of the KING database that compares proven spectral processing techniques with an auditory model representation for speaker recognition is presented. The feature sets compared are LPC (linear predictive coding) cepstral coefficients and auditory nerve firing rates provided by the Payton model. The two feature sets were quantized by two clustering algorithms, a Linde-Buzo-Gray algorithm and a Kohonen self-organizing feature map. The resulting vector quantized distortion based classification indicates that the auditory model provides accuracies comparable with LPC cepstral in nonstudio quality environments and over multiple sessions. For a 10-speaker subset using only voiced frames of 15-s segments, both achieve over 80% identification rate. Cepstral performs better on verification tasks measured with receiver operating characteristics curves.<>

Keywords

hearing; linear predictive coding; physiological models; self-organising feature maps; speech recognition; vector quantisation; KING database; Kohonen self-organizing feature map; Linde-Buzo-Gray algorithm; accuracies; auditory model representation; auditory nerve firing rates; cepstral coefficients; clustering algorithms; identification rate; linear predictive coding; speaker recognition; vector quantized distortion based classification; verification;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on

Conference_Location

Minneapolis, MN, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.1993.319407

Filename

319407