Title :
Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals
Author :
Poorjam, Amir Hossein ; Bahari, Mohamad Hasan ; Van hamme, Hugo
Author_Institution :
Center for Process. Speech &Images, KU Leuven, Leuven, Belgium
Abstract :
This paper proposes a novel approach for automatic estimation of four important traits of speakers, namely age, height, weight and smoking habit, from speech signals. In this method, each utterance is modeled using the i-vector framework which is based on the factor analysis on Gaussian Mixture Model (GMM) mean supervectors, and the Non-negative Factor Analysis (NFA) framework which is based on a constrained factor analysis on GMM weights. Then, Artificial Neural Networks (ANNs) and Least Squares Support Vector Regression (LSSVR) are employed to estimate age, height and weight of speakers from given utterances, and ANNs and logistic regression (LR) are utilized to perform smoking habit detection. Since GMM weights provide complementary information to GMM means, a score-level fusion of the i-vector-based and the NFA-based recognizers is considered for age and smoking habit estimation tasks to improve the performance. In addition, a multitask speaker profiling approach is proposed to evaluate the correlated tasks simultaneously and in interaction with each other, and consequently, to boost the accuracy in speaker age, height, weight and smoking habit estimations. To this end, a hybrid architecture involving the score-level fusion of the i-vector-based and the NFA-based recognizers is proposed to exploit the available information in both Gaussian means and Gaussian weights. ANNs are then employed to share the learned information with all tasks while they are learned in parallel. The proposed method is evaluated on telephone speech signals of National Institute for Standards and Technology (NIST) 2008 and 2010 Speaker Recognition Evaluation (SRE) corpora. Experimental results over 1194 utterances show the effectiveness of the proposed method in automatic speaker profiling.
Keywords :
Gaussian processes; least squares approximations; mixture models; neural nets; regression analysis; speaker recognition; support vector machines; ANN; GMM mean supervectors; GMM weights; Gaussian means; Gaussian mixture model; Gaussian weights; LR; LSSVR; NFA framework; NFA-based recognizers; NIST; National Institute for Standards and Technology; SRE corpora; age estimation; artificial neural networks; automatic multitask speaker profiling approach; automatic speaker trait estimation; constrained factor analysis; correlated task evaluation; factor analysis; height estimation; hybrid architecture; i-vector framework; i-vector-based recognizers; learned information sharing; least squares support vector regression; logistic regression; nonnegative factor analysis framework; performance improvement; score-level fusion; smoking habit detection; smoking habit estimation; speaker recognition evaluation corpora; spontaneous telephone speech signals; utterance modelling; weight estimation; Estimation; Kernel; Speech; Support vector machine classification; Testing; Training; Vectors; Artificial Neural Networks; Multitask Speaker Characterization; Non-negative Factor Analysis; i-vector;
Conference_Titel :
Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4799-5486-5
DOI :
10.1109/ICCKE.2014.6993339