Title :
Efficient speaker search over large populations using kernelized locality-sensitive hashing
Author :
Jeon, Woojay ; Cheng, Yan-Ming
Author_Institution :
Samsung Electron., Suwon, South Korea
Abstract :
We propose a novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work. The model allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach because of its computational simplicity while maintaining high accuracy. Furthermore, efficiency can be drastically improved when approximating searches using kernelized locality-sensitive hashing (KLSH). From a speaker´s utterance, a set of statistics are extracted according to the utterance comparison model and converted to a set of hash key bits. An Approximate Nearest Neighbor search using the Hamming Distance can be done to find candidate matches with the query speaker, which are then rank-ordered by linearly comparing them with the query using the utterance comparison model. Compared to GMM-based speaker identification and some of its variants that have been proposed to increase its efficiency, the proposed KLSH-based method is orders of magnitude faster while compromising a negligible amount of accuracy for sufficiently long query utterances. At a more fundamental level, we also discuss how our speaker matching framework differs from the traditional Bayesian decision rule used for speaker identification.
Keywords :
Bayes methods; Gaussian processes; approximation theory; search problems; speaker recognition; Bayesian decision rule; GMM-based approach; GMM-based speaker identification; Gaussian mixture model-based approach; KLSH-based method; approximate nearest neighbor search; hamming distance; kernelized locality-sensitive hashing; query speaker; speaker matching framework; speaker search; utterance comparison model; Computational modeling; Kernel; Mathematical model; Sociology; Speech; Statistics; Vectors; kernelized locality-sensitive hashing; lsh; speaker identification; speaker search;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288860