مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised idiolect discovery for speaker recognition

DocumentCode :

178041

Title :

Unsupervised idiolect discovery for speaker recognition

Author :

Jansen, Anton ; Garcia-Romero, Daniel ; Clark, P. ; Hernandez-Cordero, Juan

Author_Institution :

Human Language Technol. Center of Excellence, Johns Hopkins Univ., Baltimore, MD, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

1675

Lastpage :

1679

Abstract :

Short-time spectral characterizations of the human voice have proven to be the most dependable features available to modern speaker recognition systems. However, it is well-known that highlevel linguistic information such as word usage and pronunciation patterns can provide complementary discriminative power. In an automatic setting, the availability of these idiolectal cues is dependent on access to a word or phonetic tokenizer, ideally in the given language and domain. In this paper, we propose a novel approach to speaker recognition that leverages recently developed zero-resource term discovery algorithms to identify speaker-characteristic lexical and phrasal acoustic patterns without the need for any supervised speech recognition tools. We use the enrollment audio itself to score each trial and perform no model training (supervised or unsupervised) at any stage of the processing, allowing immediate application to any language or domain. We evaluate our approach on the extended 8-conversation core condition of the 2010 NIST SRE and demonstrate a 16% relative (0.06 absolute) reduction in minDCF when combined with a state-of-the-art unsupervised i-vector cosine system.

Keywords :

speaker recognition; speech processing; vectors; 2010 NIST SRE; complementary discriminative power; extended 8-conversation core condition; high-level linguistic information; human voice characterization; minDCF reduction; phonetic tokenizer; phrasal acoustic pattern; pronunciation pattern; short-time spectral characterization; speaker-characteristic lexical pattern; supervised speech recognition tool; unsupervised i-vector cosine system; unsupervised idiolect discovery; zero-resource term discovery algorithm; Acoustics; Feature extraction; Hidden Markov models; NIST; Speaker recognition; Speech; Speech recognition; Zero resource; idiolect; speaker recognition; unsupervised term discovery;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6853883

Filename :

6853883

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=178041