DocumentCode :
2451869
Title :
Exploiting language cluster information for language pair identification
Author :
Bing, Jiang ; Yan, Song ; Li-Rong, Dai
Author_Institution :
Dept. of EEIS, Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2012
fDate :
16-18 July 2012
Firstpage :
1005
Lastpage :
1009
Abstract :
Recently, significant progress has been witnessed in automatic language identification. Perhaps the most important development is the application of discriminative training. However, efficient and effective discriminative training for language pair identification is still a challenging problem. Specifically, with the increase of the number of languages to be recognized, the number of the language pairs will grow quadratically, which may make the discriminative training complex. Furthermore, it is difficult to collect data for certain languages. The data imbalance also biases the discriminative training and degrades the performance. To address these issues, we propose to exploit language clustering information for effective language pair identification. It is mainly motivated from the perspective of the linguistics that all languages can be divided into several families. In our proposed method, the language clusters are constructed based on the linguistic knowledge. The language pairs are categorized into “intra-cluster” and “inter-cluster”, and different weighting schemes are proposed for identification. To evaluate the effectiveness of our proposed scheme, we conduct extensive experiment on NIST LRE 2011 and a small subset of languages selected from it. The experimental results show that by exploiting the language clustering information, the resulted identification performance has been improved significantly, compared with baseline system.
Keywords :
computational linguistics; natural language processing; pattern clustering; speech recognition; NIST LRE 2011; National Institute of Standards and Technology; baseline system; discriminative training; intercluster; intra-cluster; language cluster information; language clustering information; language pair identification; language recognition evaluation; linguistic knowledge; weighting schemes; NIST; Pragmatics; Speech; Speech recognition; Support vector machines; Training; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Audio, Language and Image Processing (ICALIP), 2012 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0173-2
Type :
conf
DOI :
10.1109/ICALIP.2012.6376762
Filename :
6376762
Link To Document :
بازگشت