DocumentCode :
3165601
Title :
Keyword-conditioned phone N-gram modeling with contextual information for speaker verification
Author :
Han, Kyu J. ; Pelecanos, Jason ; Omar, Mohamed K.
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4797
Lastpage :
4800
Abstract :
In this paper we present our current work on automatic speaker recognition using keyword-conditioned phone N-gram modeling. We propose the use of contextual information around keywords in modeling a speaker´s pronunciation characteristics at a phonetic level. Our approach is to add time margins around keywords when aligning keyword regions with keyword-specific phone events for feature vector generation. Including such additional information by incorporating time margins can capture idiosyncratic pronunciation information and is shown to help our keyword-conditioned phonetic speaker verification system achieve more than 50% (relative) performance improvement. This leads our high-level speaker verification system (i.e., fusion of non-conditioned and keyword-conditioned phonetic speaker verification systems) to currently achieve the best published result for the English 8-conversation enrollment telephony task of the 2008 NIST Speaker Recognition Evaluation for systems utilizing features not based directly on low-level acoustic information.
Keywords :
feature extraction; natural language processing; speaker recognition; telephony; 2008 NIST speaker recognition evaluation; English 8-conversation enrollment telephony task; automatic speaker recognition; contextual information; feature vector generation; high-level speaker verification system; idiosyncratic pronunciation information; keyword-conditioned phone N-gram modeling; keyword-conditioned phonetic speaker verification system; keyword-specific phone events; low-level acoustic information; phonetic level; speaker pronunciation characteristics; time margins; Acoustics; Context modeling; Feature extraction; NIST; Speaker recognition; Support vector machines; Vectors; Speaker verification; contextual information; keyword-conditioned phone N-gram modeling; time margin;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288992
Filename :
6288992
Link To Document :
بازگشت