Title :
Unsupervised idiolect discovery for speaker recognition
Author :
Jansen, Anton ; Garcia-Romero, Daniel ; Clark, P. ; Hernandez-Cordero, Juan
Author_Institution :
Human Language Technol. Center of Excellence, Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
Short-time spectral characterizations of the human voice have proven to be the most dependable features available to modern speaker recognition systems. However, it is well-known that highlevel linguistic information such as word usage and pronunciation patterns can provide complementary discriminative power. In an automatic setting, the availability of these idiolectal cues is dependent on access to a word or phonetic tokenizer, ideally in the given language and domain. In this paper, we propose a novel approach to speaker recognition that leverages recently developed zero-resource term discovery algorithms to identify speaker-characteristic lexical and phrasal acoustic patterns without the need for any supervised speech recognition tools. We use the enrollment audio itself to score each trial and perform no model training (supervised or unsupervised) at any stage of the processing, allowing immediate application to any language or domain. We evaluate our approach on the extended 8-conversation core condition of the 2010 NIST SRE and demonstrate a 16% relative (0.06 absolute) reduction in minDCF when combined with a state-of-the-art unsupervised i-vector cosine system.
Keywords :
speaker recognition; speech processing; vectors; 2010 NIST SRE; complementary discriminative power; extended 8-conversation core condition; high-level linguistic information; human voice characterization; minDCF reduction; phonetic tokenizer; phrasal acoustic pattern; pronunciation pattern; short-time spectral characterization; speaker-characteristic lexical pattern; supervised speech recognition tool; unsupervised i-vector cosine system; unsupervised idiolect discovery; zero-resource term discovery algorithm; Acoustics; Feature extraction; Hidden Markov models; NIST; Speaker recognition; Speech; Speech recognition; Zero resource; idiolect; speaker recognition; unsupervised term discovery;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6853883