مرکز منطقه ای اطلاع رساني علوم و فناوري - Language recognition using deep-structured conditional random fields

DocumentCode :

2790650

Title :

Language recognition using deep-structured conditional random fields

Author :

Yu, Dong ; Wang, Shizhen ; Karam, Zahi ; Deng, Li

Author_Institution :

Microsoft Res., Redmond, WA, USA

fYear :

2010

fDate :

14-19 March 2010

Firstpage :

5030

Lastpage :

5033

Abstract :

We present a novel language identification technique using our recently developed deep-structured conditional random fields (CRFs). The deep-structured CRF is a multi-layer CRF model in which each higher layer´s input observation sequence consists of the lower layer´s observation sequence and the resulting lower layer´s frame-level marginal probabilities. In this paper we extend the original deep-structured CRF by allowing for distinct state representations at different layers and demonstrate its benefits. We propose an unsupervised algorithm to pre-train the intermediate layers by casting it as a multi-objective programming problem that is aimed at minimizing the average frame-level conditional entropy while maximizing the state occupation entropy. Empirical evaluation on a seven-language/dialect voice mail routing task showed that our approach can achieve a routing accuracy (RA) of 86.4% and average equal error rate (EER) of 6.6%. These results are significantly better than the 82.5% RA and 7.5% average EER obtained using the Gaussian mixture model trained with the maximum mutual information criterion but slightly worse than the 87.7% RA and 6.4% EER achieved using the support vector machine with model pushing on the Gaussian super vector (GSV).

Keywords :

entropy; random processes; speech recognition; support vector machines; voice mail; Gaussian super vector; deep-structured conditional random fields; equal error rate; frame-level conditional entropy; frame-level marginal probabilities; language identification technique; language recognition; multiobjective programming problem; observation sequence; state occupation entropy; support vector machine; voice mail routing task; Automatic speech recognition; Casting; Entropy; Error analysis; Mutual information; Routing; Support vector machine classification; Support vector machines; Unsupervised learning; Voice mail; conditional random field; deep learning; deep-structure; language identification; unsupervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location :

Dallas, TX

ISSN :

1520-6149

Print_ISBN :

978-1-4244-4295-9

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2010.5495072

Filename :

5495072

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2790650