مرکز منطقه ای اطلاع رساني علوم و فناوري - Keyword-conditioned phone N-gram modeling with contextual information for speaker verification

DocumentCode :

3165601

Title :

Keyword-conditioned phone N-gram modeling with contextual information for speaker verification

Author :

Han, Kyu J. ; Pelecanos, Jason ; Omar, Mohamed K.

Author_Institution :

IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4797

Lastpage :

4800

Abstract :

In this paper we present our current work on automatic speaker recognition using keyword-conditioned phone N-gram modeling. We propose the use of contextual information around keywords in modeling a speaker´s pronunciation characteristics at a phonetic level. Our approach is to add time margins around keywords when aligning keyword regions with keyword-specific phone events for feature vector generation. Including such additional information by incorporating time margins can capture idiosyncratic pronunciation information and is shown to help our keyword-conditioned phonetic speaker verification system achieve more than 50% (relative) performance improvement. This leads our high-level speaker verification system (i.e., fusion of non-conditioned and keyword-conditioned phonetic speaker verification systems) to currently achieve the best published result for the English 8-conversation enrollment telephony task of the 2008 NIST Speaker Recognition Evaluation for systems utilizing features not based directly on low-level acoustic information.

Keywords :

feature extraction; natural language processing; speaker recognition; telephony; 2008 NIST speaker recognition evaluation; English 8-conversation enrollment telephony task; automatic speaker recognition; contextual information; feature vector generation; high-level speaker verification system; idiosyncratic pronunciation information; keyword-conditioned phone N-gram modeling; keyword-conditioned phonetic speaker verification system; keyword-specific phone events; low-level acoustic information; phonetic level; speaker pronunciation characteristics; time margins; Acoustics; Context modeling; Feature extraction; NIST; Speaker recognition; Support vector machines; Vectors; Speaker verification; contextual information; keyword-conditioned phone N-gram modeling; time margin;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288992

Filename :

6288992

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3165601