DocumentCode :
989830
Title :
Discriminative In-Set/Out-of-Set Speaker Recognition
Author :
Angkititrakul, Pongtep ; Hansen, John H L
Author_Institution :
Center for Robust Speech Syst., Texas Univ., Richardson, TX
Volume :
15
Issue :
2
fYear :
2007
Firstpage :
498
Lastpage :
508
Abstract :
In this paper, the problem of identifying in-set versus out-of-set speakers for limited training/test data durations is addressed. The recognition objective is to form a decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or outside speakers. The general goal is to perform rapid speaker model construction from limited enrollment and test size resources for in-set testing for input audio streams. In-set detection can help ensure security and proper access to private information, as well as detecting and tracking input speakers. Areas of applications of these concepts include rapid speaker tagging and tracking for information retrieval, communication networks, personal device assistants, and location access. We propose an integrated system with emphasis on short-enrollment data (about 5 s of speech for each enrolled speaker) and test data (2-8 s) within a text-independent mode. We present a simple and yet powerful decision rule to accept or reject speakers using a discriminative vector in the decision score space, together with statistical hypothesis testing based on the conventional likelihood ratio test. Discriminative training is introduced to further improve system performance for both decision techniques, by employing minimum classification error and minimum verification error frameworks. Experiments are performed using three separate corpora. Using the YOHO speaker recognition database, the alternative decision rule achieves measurable improvement over the likelihood ratio test, and discriminative training consistently enhances overall system performance with relative improvements ranging from 11.26%-28.68%. A further extended evaluation using the TIMIT (CORPUS1) and actual noisy aircraft communications data (CORPUS2) shows measurable improvement over the traditional MAP based scheme using the likelihood ratio test (MAP-LRT), with average EERs of 9%-23% for TIMIT and 13%-32% for noisy aircraft communications. The result- s confirm that an effective in-set/out-of-set speaker recognition system can be formulated using discriminative training for rapid tagging of input speakers from limited training and test data sizes
Keywords :
speaker recognition; speech processing; statistical testing; MAP based scheme; aircraft communications data; communication networks; discriminative vector; in-set speaker recognition; in-set testing; information retrieval; input audio streams; location access; minimum classification error; minimum verification error; out-of-set speaker recognition; personal device assistants; speaker model construction; speaker tagging; speaker tracking; statistical hypothesis testing; text-independent mode; Aircraft; Communication system security; Data security; Performance evaluation; Signal to noise ratio; Speaker recognition; Streaming media; System performance; System testing; Tagging; Decision score space; discriminative training; in-set/out-of-set; limited training data; minimum classification error; minimum verification error; speaker recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2006.881689
Filename :
4067020
Link To Document :
بازگشت