Memory-aware i-vector extraction by means of sub-space factorization

Author

Cumani, Sandro ; Laface, Pietro

Author_Institution

Politec. di Torino, Turin, Italy

fYear

2015

fDate

19-24 April 2015

Firstpage

4669

Lastpage

4673

Abstract

Most of the state-of-the-art speaker recognition systems use i-vectors, a compact representation of spoken utterances. Since the “standard” i-vector extraction procedure requires large memory structures, we recently presented the Factorized Sub-space Estimation (FSE) approach, an efficient technique that dramatically reduces the memory needs for i-vector extraction, and is also fast and accurate compared to other proposed approaches. FSE is based on the approximation of the matrix T, representing the speaker variability sub-space, by means of the product of appropriately designed matrices. In this work, we introduce and evaluate a further approximation of the matrices that most contribute to the memory costs in the FSE approach, showing that it is possible to obtain comparable system accuracy using less than a half of FSE memory, which corresponds to more than 60 times memory reduction with respect to the standard method of i-vector extraction.

Keywords

approximation theory; feature extraction; matrix decomposition; speaker recognition; FSE; factorized subspace estimation; i-vectors; matrix T approximation; memory costs; speaker variability subspace; standard i-vector extraction procedure; state-of-the-art speaker recognition systems; Continuous wavelet transforms; I-vector extraction; I-vectors; Probabilistic Linear Discriminant Analysis; Speaker Recognition; matrix rotation;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178856

Filename

7178856