Robust Speaker Verification With Joint Sparse Coding Over Learned Dictionaries

Author

Haris, B.C. ; Sinha, R.

Author_Institution

IIT Guwahati, Guwahati, India

Volume

10

Issue

10

fYear

2015

fDate

Oct. 2015

Firstpage

2143

Lastpage

2157

Abstract

This paper presents a novel paradigm for speaker verification (SV) exploiting sparse representation (SR) over a learned dictionary. The proposed approach is intended to overcome the shortcomings of existing SR over an exemplar dictionary-based SV systems. In this paper, the supervectors created by concatenating the mean vectors of adapted Gaussian mixture models are used as speaker representations. Both simple and discriminative methods are explored for learning the dictionary in the supervector domain. The learned dictionary-based approach is further extended to enable the compensation of the session/channel variability by using a joint sparse coding over speaker and channel dictionaries. The proposed systems are evaluated on the NIST 2012 SRE data set and are contrasted with the state-of-the-art i-vector probabilistic linear discriminant analysis-based SV system. The proposed system is found to possess the following attributes: 1) a significantly higher performance for very low-false alarm rates, which makes the system attractive for high-security applications; 2) a higher robustness to the short duration test data condition; 3) a competitive robustness to additive noise in test data; and 4) a much lower computational complexity. Even on comparing with the fastest i-vector computation methods reported in the literature, the complexity of the proposed system is found to be comparable. With these features, the proposed approach seems to be a promising candidate for practical voice biometric applications.

Keywords

Gaussian processes; computational complexity; mixture models; signal representation; speaker recognition; speech coding; NIST 2012 SRE data set; SR; adapted Gaussian mixture models; additive noise; computational complexity; false alarm rates; i-vector probabilistic linear discriminant analysis-based SV system; joint sparse coding; learned dictionary; mean vector concatenation; session-channel variability; sparse representation; speaker representations; speaker verification; Dictionaries; Encoding; Indexes; Joints; Measurement; Robustness; Training; GMM supervector; Voice biometrics; learned dictionary; sparse representation classification; speaker verification;

fLanguage

English

Journal_Title

Information Forensics and Security, IEEE Transactions on

Publisher

ieee

ISSN

1556-6013

Type

jour

DOI

10.1109/TIFS.2015.2450674

Filename

7138605