DocumentCode
112680
Title
Robust Speaker Verification With Joint Sparse Coding Over Learned Dictionaries
Author
Haris, B.C. ; Sinha, R.
Author_Institution
IIT Guwahati, Guwahati, India
Volume
10
Issue
10
fYear
2015
fDate
Oct. 2015
Firstpage
2143
Lastpage
2157
Abstract
This paper presents a novel paradigm for speaker verification (SV) exploiting sparse representation (SR) over a learned dictionary. The proposed approach is intended to overcome the shortcomings of existing SR over an exemplar dictionary-based SV systems. In this paper, the supervectors created by concatenating the mean vectors of adapted Gaussian mixture models are used as speaker representations. Both simple and discriminative methods are explored for learning the dictionary in the supervector domain. The learned dictionary-based approach is further extended to enable the compensation of the session/channel variability by using a joint sparse coding over speaker and channel dictionaries. The proposed systems are evaluated on the NIST 2012 SRE data set and are contrasted with the state-of-the-art i-vector probabilistic linear discriminant analysis-based SV system. The proposed system is found to possess the following attributes: 1) a significantly higher performance for very low-false alarm rates, which makes the system attractive for high-security applications; 2) a higher robustness to the short duration test data condition; 3) a competitive robustness to additive noise in test data; and 4) a much lower computational complexity. Even on comparing with the fastest i-vector computation methods reported in the literature, the complexity of the proposed system is found to be comparable. With these features, the proposed approach seems to be a promising candidate for practical voice biometric applications.
Keywords
Gaussian processes; computational complexity; mixture models; signal representation; speaker recognition; speech coding; NIST 2012 SRE data set; SR; adapted Gaussian mixture models; additive noise; computational complexity; false alarm rates; i-vector probabilistic linear discriminant analysis-based SV system; joint sparse coding; learned dictionary; mean vector concatenation; session-channel variability; sparse representation; speaker representations; speaker verification; Dictionaries; Encoding; Indexes; Joints; Measurement; Robustness; Training; GMM supervector; Voice biometrics; learned dictionary; sparse representation classification; speaker verification;
fLanguage
English
Journal_Title
Information Forensics and Security, IEEE Transactions on
Publisher
ieee
ISSN
1556-6013
Type
jour
DOI
10.1109/TIFS.2015.2450674
Filename
7138605
Link To Document