Title :
Discriminatively trained Language Models using Support Vector Machines for Language Identification
Author :
Zhai, Lu-Feng ; Siu, Man-Hung ; Yang, Xi ; Gish, Herbert
Author_Institution :
Hong Kong Univ. of Sci. & Technol.
Abstract :
In this paper, we explore the use of the support vector machines (SVMs) to learn a discriminatively trained n-gram model for automatic language identification. Our focus is on practical considerations that make SVM technology more effective. We address the performance related issues of class priors, data imbalance, feature weighting, score normalization and combining multiple knowledge sources with SVMs. Using modified n-gram counts as features, we show that the SVM-trained n-grams are effective classifiers but they are sensitive to changes in prior class distributions. Using balanced prior distributions or score normalization procedures, the SVM-trained n-gram outperformed the traditional n-gram in parallel phoneme recognition with language model and GMM-UBM-based language identification systems by more than 30% relative error reduction on the OGI-TS corpus
Keywords :
Gaussian distribution; natural languages; speech recognition; support vector machines; training; GMM-UBM; Gaussian mixture model; OGI-TS corpus; SVM; automatic language identification; balanced prior distribution; discriminatively trained language model; parallel phoneme recognition; score normalization; support vector machine; Acoustics; Engines; Maximum likelihood estimation; Natural languages; Pattern recognition; Power system modeling; Support vector machine classification; Support vector machines; Testing; Training data;
Conference_Titel :
Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006: The
Conference_Location :
San Juan
Print_ISBN :
1-424400471-1
Electronic_ISBN :
1-4244-0472-X
DOI :
10.1109/ODYSSEY.2006.248098