Title :
Perturbation and pitch normalization as enhancements to speaker recognition
Author :
Lawson, A. ; Linderman, M. ; Leonard, M. ; Stauffer, A. ; Pokines, B. ; Carlin, M.
Author_Institution :
RADC, Inc., CA
Abstract :
This study proposes an approach to improving speaker recognition through the process of minute vocal tract length perturbation of training files, coupled with pitch normalization for both train and test data. The notion of perturbation as a method for improving the robustness of training data for supervised classification is taken from the field of optical character recognition, where distorting characters within a certain range has shown strong improvements across disparate conditions. This paper demonstrates that acoustic perturbation, in this case analysis, distortion, and resynthesis of vocal tract length for a given speaker, significantly improves speaker recognition when the resulting files are used to augment or replace the training data. A pitch length normalization technique is also discussed, which is combined with perturbation to improve open-set speaker recognition from an EER of 20% to 6.7%.
Keywords :
learning (artificial intelligence); perturbation techniques; speaker recognition; speech synthesis; acoustic perturbation; minute vocal tract length perturbation; open set speaker recognition; optical character recognition; pitch length normalization; speaker recognition enhancement; supervised classification; training data; training files; Acoustic distortion; Character recognition; Loudspeakers; Optical character recognition software; Optical distortion; Robustness; Speaker recognition; Speech analysis; Speech synthesis; Training data; speaker recognition; speech synthesis;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2009.4960638