Title :
Language recognition using deep-structured conditional random fields
Author :
Yu, Dong ; Wang, Shizhen ; Karam, Zahi ; Deng, Li
Author_Institution :
Microsoft Res., Redmond, WA, USA
Abstract :
We present a novel language identification technique using our recently developed deep-structured conditional random fields (CRFs). The deep-structured CRF is a multi-layer CRF model in which each higher layer´s input observation sequence consists of the lower layer´s observation sequence and the resulting lower layer´s frame-level marginal probabilities. In this paper we extend the original deep-structured CRF by allowing for distinct state representations at different layers and demonstrate its benefits. We propose an unsupervised algorithm to pre-train the intermediate layers by casting it as a multi-objective programming problem that is aimed at minimizing the average frame-level conditional entropy while maximizing the state occupation entropy. Empirical evaluation on a seven-language/dialect voice mail routing task showed that our approach can achieve a routing accuracy (RA) of 86.4% and average equal error rate (EER) of 6.6%. These results are significantly better than the 82.5% RA and 7.5% average EER obtained using the Gaussian mixture model trained with the maximum mutual information criterion but slightly worse than the 87.7% RA and 6.4% EER achieved using the support vector machine with model pushing on the Gaussian super vector (GSV).
Keywords :
entropy; random processes; speech recognition; support vector machines; voice mail; Gaussian super vector; deep-structured conditional random fields; equal error rate; frame-level conditional entropy; frame-level marginal probabilities; language identification technique; language recognition; multiobjective programming problem; observation sequence; state occupation entropy; support vector machine; voice mail routing task; Automatic speech recognition; Casting; Entropy; Error analysis; Mutual information; Routing; Support vector machine classification; Support vector machines; Unsupervised learning; Voice mail; conditional random field; deep learning; deep-structure; language identification; unsupervised learning;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495072