DocumentCode :
337488
Title :
An unsupervised approach to language identification
Author :
Pellegrino, F. ; André-Obrecht, R.
Author_Institution :
IRIT, Toulouse, France
Volume :
2
fYear :
1999
fDate :
15-19 Mar 1999
Firstpage :
833
Abstract :
This paper presents an unsupervised approach to automatic language identification (ALI) based on vowel system modeling. Each language vowel system is modeled by a Gaussian mixture model (GMM) trained with automatically detected vowels. Since this detection is unsupervised and language independent, no labeled data are required. GMMs are initialized using an efficient data-driven variant of the LBG algorithm: the LBG-Rissanen (1983) algorithm. With 5 languages from the OGI MLTS corpus and in a close set identification task, we reach 79% of correct identification using only the vowel segments detected in 45 second duration utterances for the male speakers
Keywords :
Gaussian processes; acoustic signal processing; natural languages; speech processing; unsupervised learning; 45 s; Gaussian mixture model; LBG-Rissanen algorithm; OGI MLTS corpus; acoustic processing; automatic speech processing; automatically detected vowels; close set identification task; correct identification; data-driven variant LBG algorithm; language identification; language independent detection; language vowel system; male speakers; unsupervised detection; utterances; vowel segments; vowel system modeling; Acoustic signal detection; Cepstral analysis; Databases; Entropy; Hidden Markov models; Modeling; Natural languages; Speech processing; Speech recognition; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
Conference_Location :
Phoenix, AZ
ISSN :
1520-6149
Print_ISBN :
0-7803-5041-3
Type :
conf
DOI :
10.1109/ICASSP.1999.759800
Filename :
759800
Link To Document :
بازگشت