مرکز منطقه ای اطلاع رساني علوم و فناوري - Fusion of two classifiers for speaker identification: removing and not removing silence

DocumentCode :

451019

Title :

Fusion of two classifiers for speaker identification: removing and not removing silence

Author :

Hu, Roland ; Damper, R.I.

Author_Institution :

Sch. of Electron. & Comput. Sci., Southampton Univ., UK

Volume :

fYear :

2005

fDate :

25-28 July 2005

Abstract :

In designing the speaker recognition part of an audiovisual person identification system, we have found that identification rate is improved after (automatically) removing silence (both intra- and inter-word) from the speech signal. Conversely, we also find that some tokens, which are incorrectly identified by removing silence, are correctly identified by not removing silence. Hence, in an attempt to improve performance, speech signals with and without silence is fed separately into two different text-independent speaker classifiers and the weighted sum rule is used to fuse their outputs. The main contribution of this paper is to impose a new theoretical method for finding the weighting parameter(s) for the weighted sum rule. By assuming multi-modal Gaussian distributions, we have changed the problem of choosing weighting parameters for classifier fusion to that of maximizing the correct identification estimation function; our solution is applicable to many other fusion scenarios. Once the parameters of the multi-modal distributions have been estimated from example data, we can use the theory to calculate correct identification rates as a function of the (single) weighting parameter α for the two-classifier combination. Doing this for a range of α value allows the optimal point to be found. This theoretical method is tested against empirical determination in which we use the actual speaker recognizer in place of the theory. The comparison is done using data of 74 speakers (51 male, 23 female) from the XM2VTS database. A clearer optimal point is found with the theoretical method.

Keywords :

Gaussian distribution; audio-visual systems; signal classification; speaker recognition; XM2VTS database; audiovisual person identification; classifier fusion; empirical determination; multimodal Gaussian distribution; speaker recognizer; speech signal; text-independent speaker classifier; weighted sum rule; Biometrics; Computer science; Databases; Fuses; Gaussian distribution; Signal design; Signal processing; Speaker recognition; Speech; Testing; Gaussian mixture models; classifier fusion; speaker identification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Fusion, 2005 8th International Conference on

Print_ISBN :

0-7803-9286-8

Type :

conf

DOI :

10.1109/ICIF.2005.1591887

Filename :

1591887

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=451019