Title :
Fisher Linear Semi-Discriminant Analysis for Speaker Diarization
Author :
Giannakopoulos, Theodoros ; Petridis, Sergios
Author_Institution :
NCSR “DEMOKRITOS”, Agia Paraskevi, Greece
Abstract :
Given an audio signal with an unknown number of people speaking, speaker diarization aims to automatically answer the question “who spoke when.” Crucial to the success of diarization is the distance metric between speech segments, a factor depending on the choice of the feature space: distances should be low for segments of the same speaker and high for segments of different speakers. Starting from an Mel-frequency cepstrum coefficient (MFCC)-based feature space, an algorithm is proposed that finds a Fisher near-optimal linear discriminant subspace, adapted to the particular speakers which exist in the audio signal. The proposed approach relies on a semi-supervised version of Fisher linear discriminant analysis (FLD), leveraging information from the sequential structure of the audio signal as a substitute for unknown speaker labels. The resulting algorithm is completely unsupervised; therefore, the need for speaker labels in the provided or an independent set is dismissed. The eigenvalue perturbation theory is applied in order to provide optimality bounds with respect to FLD, showing the effectiveness of the approach under the assumption that speakers do not significantly modify the characteristics of their voice. A complete diarization system is then proposed, using fuzzy clustering, a non-parametric K-nearest neighbors classifier and a hidden Markov model. The experimental results show a major improvement of speaker diarization accuracy when using the optimal subspace found by the proposed approach with respect to using the initial MFCC feature space or subspaces found by competitive approaches.
Keywords :
audio signal processing; eigenvalues and eigenfunctions; fuzzy set theory; hidden Markov models; speaker recognition; speech processing; FLD; Fisher linear semidiscriminant analysis; Fisher near-optimal linear discriminant subspace; MFCC-based feature space; audio signal; distance metric; eigenvalue perturbation theory; fuzzy clustering; hidden Markov model; mel-frequency cepstrum coefficient; nonparametric K-nearest neighbors classifier; semisupervised version; speaker diarization; speaker label; speech segment; Covariance matrix; Eigenvalues and eigenfunctions; Feature extraction; Hidden Markov models; Message systems; Speech; Vectors; Fisher linear discriminant analysis (FLD); speaker diarization;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2191285