مرکز منطقه ای اطلاع رساني علوم و فناوري - Sample selection for automatic language identification

DocumentCode :

3423179

Title :

Sample selection for automatic language identification

Author :

Farris, David ; White, Chris ; Khudanpur, Sanjeev

Author_Institution :

Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD

fYear :

2008

fDate :

March 31 2008-April 4 2008

Firstpage :

4225

Lastpage :

4228

Abstract :

Current approaches to automatic spoken language identification (LID) assume the availability of a large corpus of manually language-labeled speech samples for training statistical classifiers. We investigate two methods of active learning to significantly reduce the amount of labeled speech needed for training LID systems. Starting with a small training set, an automated method is used to select samples from a corpus of unlabeled speech, which are then labeled and added to the training pool - one selection method is based on a previously known entropy criterion, and another on a novel likelihood-ratio criterion. We demonstrate LID performance comparable to a large training corpus using only a tenth of the training data. A further 40% improvement in LID performance is obtained using a third of the training data. Finally, we show that our novel selection method is more robust to variance in the unlabeled pool than the entropy based method.

Keywords :

entropy; natural language processing; speech recognition; automatic language identification; entropy criterion; language-labeled speech samples; likelihood-ratio criterion; sample selection; spoken language identification; statistical classifiers; Costs; Error analysis; Iterative algorithms; Iterative methods; Natural languages; Partitioning algorithms; Sampling methods; Speech processing; Training data; Uncertainty; natural languages; speech processing; unsupervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location :

Las Vegas, NV

ISSN :

1520-6149

Print_ISBN :

978-1-4244-1483-3

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2008.4518587

Filename :

4518587

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3423179