DocumentCode :
1300757
Title :
Partially Supervised Speaker Clustering
Author :
Tang, Hao ; Chu, Stephen Mingyu ; Hasegawa-Johnson, Mark ; Huang, Thomas S.
Author_Institution :
HP Labs., Palo Alto, CA, USA
Volume :
34
Issue :
5
fYear :
2012
fDate :
5/1/2012 12:00:00 AM
Firstpage :
959
Lastpage :
971
Abstract :
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm-linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and s- atistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
Keywords :
Gaussian processes; content-based retrieval; graph theory; learning (artificial intelligence); multimedia databases; pattern clustering; speaker recognition; GALE database; GMM mean supervector representation; Gaussian mixture model mean supervector representation; content-based multimedia indexing; content-based multimedia processing; content-based multimedia retrieval; cosine distance metric; directional scattering property; discriminative speaker subspace; distance metric learning algorithm-linear spherical discriminant analysis; multimedia databases; partially supervised speaker clustering; speaker-discriminative acoustic feature transformation; speaker-discriminative distance metric; speech utterance; statistical model-based distance metrics; universal speaker prior model; unsupervised speaker clustering process; vector-based distance metrics; Acoustics; Feature extraction; Measurement; Pipelines; Speech; Training; Training data; Speaker clustering; distance metric learning.; partial supervision; Artificial Intelligence; Cluster Analysis; Discriminant Analysis; Humans; Pattern Recognition, Automated; Signal Processing, Computer-Assisted; Speech;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2011.174
Filename :
5989833
Link To Document :
بازگشت