Title :
Improved Semi-Parametric Mean Trajectory Model Using Discriminatively Trained Centroids
Author :
Xu, Ran ; Pan, Jielin ; Yan, Yonghong
Author_Institution :
ThinkIT Speech Lab. Inst. of Acoust., Chinese Acad. of Sci., Beijing, China
Abstract :
In order to alleviate the limitation of "state output probability conditional independence" assumption held by Hidden Markov models (HMMs) in speech recognition, a discriminative semi-parametric trajectory model was proposed in recent years, in which both means and variances in the acoustic models are modeled as time-varying variables. The time- varying information is modeled as a weighted contribution from all the "centroids", which can be viewed as the representation of the acoustic space. In previous literatures, such centroids are often obtained by clustering the Gaussians in the baseline acoustic models to some reasonable number or by training a baseline model with fewer Gaussian components. The centroids obtained in this way are maximum likelihood estimation of the acoustic space, which are relatively weak in discriminability compared to the discriminatively trained acoustic models. In this paper, we proposed an improved semi-parametric mean trajectory model training framework, in which the centroids are first discriminatively trained by minimum phone error criterion to provide a more discriminative representation of the acoustic space. This method was evaluated on the Mandarin digit string recognition task. The experimental result shows that our proposed method improves the recognition performance by a relative string error rate reduction of 7.5% compared to the traditional discriminative semi-parametric trajectory model, and it outperforms the baseline acoustic model trained with maximum likelihood criterion by a relative string error rate reduction of 28.6%.
Keywords :
hidden Markov models; maximum likelihood estimation; speech recognition; discriminatively trained centroids; hidden Markov models; maximum likelihood estimation; semi-parametric mean trajectory model; speech recognition; time-varying information; Acoustics; Error analysis; Gaussian processes; Hidden Markov models; Laboratories; Maximum likelihood estimation; Mutual information; Radio access networks; Speech recognition; Vocabulary;
Conference_Titel :
Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2942-4
Electronic_ISBN :
978-1-4244-2943-1
DOI :
10.1109/CHINSL.2008.ECP.63