DocumentCode :
3585058
Title :
Training candidate selection for effective rejection in open-set language identification
Author :
Qian Zhang ; Hansen, John H. L.
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
fYear :
2014
Firstpage :
384
Lastpage :
389
Abstract :
Research in open-set language identification (LID) generally focuses more on accurate in-set modeling versus improved out-of-set (OOS) rejection. Unknown or OOS language rejection is a challenge, since research developers seldom commit equivalent OOS corpus development effort versus the desired in-set languages. To address this, we propose an OOS candidate selection method for universal OOS language coverage. Since effective selection always requires abundant knowledge of inter-language relationships, three broad measurements across world languages are considered. Finally, the advanced OOS selection method is evaluated on a database derived from a large-scale corpus (LRE-09) with a state-of-the-art i-Vector system followed by two back-ends. The baseline system is realized using a random selection of OOS candidates. With the proposed selection method and probabilistic linear discriminative analysis (PLDA) back-end, the OOS rejection performance is improved with false alarm and miss rates achieving a relative reduction of 32.6% and 4.4%, respectively. In addition, the overall classification performance are relatively improved 8.4% and 7.5% according to the two back-ends based on an average cost function.
Keywords :
natural language processing; probability; LID; LRE-09; OOS corpus development; OOS language rejection; OOS language selection method; OOS rejection performance improvement; PLDA back-end; average cost function; baseline system; classification performance improvement; false alarm rates; i-Vector system; in-set language modeling; interlanguage relationship knowledge; large-scale corpus; miss rates; open-set language identification; out-of-set rejection; probabilistic linear discriminative analysis back-end; relative reduction; universal OOS language coverage; unknown language rejection; world languages; Abstracts; Acoustics; Pragmatics; Speech; LRE-09; Open-set language identification; Out-of-set identification; candidate selection; i-Vector; language distance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
Type :
conf
DOI :
10.1109/SLT.2014.7078605
Filename :
7078605
Link To Document :
بازگشت