Title :
Speaker adaptation based on the multilinear decomposition of training speaker models
Author_Institution :
Sch. of Electr. Eng., Pusan Nat. Univ., Busan, South Korea
Abstract :
This paper presents a novel speaker adaptation method based on the multilinear analysis of training speakers using Tucker decomposition. A Tucker decomposition of training models can decouple the dataset into the subspaces of state, dimension of the mean vector, and speaker. Using the bases of the state subspace, we derive a speaker adaptation formula where the matrix of basis vectors is weighted in row and column spaces; the proposed method can include the eigenvoice technique as a subset. The results from the isolated-word recognition task showed that the Tucker decomposition-based method outperformed both eigenvoice and MLLR for the adaptation data whose lengths are 15 seconds or longer. Furthermore, the method can easily be extended to multi-factor problems, thus enabling the adaptation of multiple factors such as speaker and noise environment.
Keywords :
eigenvalues and eigenfunctions; noise (working environment); speaker recognition; speech recognition; Tucker decomposition; eigenvoice method; mean vector; multilinear analysis; multilinear decomposition; noise environment; speaker adaptation; speech recognition; training speaker models; Adaptive arrays; Algebra; Clustering methods; Matrix decomposition; Maximum likelihood linear regression; Principal component analysis; Speech recognition; Tensile stress; Testing; Working environment noise; Speech recognition; Tucker decomposition; speaker adaptation;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495117