Title :
Improved language identification using sampling rate compensation & gender based language models for Indian languages
Author :
Joshi, Devashree ; Joshi, S.D.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., New Delhi, New Delhi, India
Abstract :
In today´s world of emerging technology, a Language identification system may get a test language sample from various sources like Landline or Mobile telephone call, VoIP packet, radio transmission, sample recorded on a computer etc. Often, there is a variation between the sampling rate of the test language sample and the training samples used for language models in the back end. This difference leads to deterioration in the performance of the system. Hence, there is a requirement to carry out sampling rate compensation. It is proposed to introduce a sampling rate compensation block in the language identification system. Sampling rate compensation ensures that the system is independent of the effects of sampling rate variation between the test and training language data. Also, it is proposed to use gender based language models. This approach has better performance in terms of language identification task and it also gives the gender of the speaker. Gender information extracted is useful when gender based user classification is being carried out or while providing customized gender based services. This technique of gender based language identification along with sampling rate compensation has been found to improve results for both the Vector Quantization codebook and Gaussian Mixture Model for languages. The sampling rate compensation block can be easily used in speaker identification systems also for improving performance.
Keywords :
Gaussian processes; gender issues; information retrieval; natural language processing; pattern classification; sampling methods; speaker recognition; vector quantisation; Gaussian mixture model; Indian languages; customized gender based services; gender based language models; gender based user classification; gender information extraction; language identification system; sampling rate compensation; sampling rate variation; speaker gender identification; test language data; training language data; vector quantization codebook; Data models; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech processing; Training; Vector quantization; Gaussian Mixture modelling; Gender based language models; sampling rate compensation; vector quantisation;
Conference_Titel :
Signal Processing, Computing and Control (ISPCC), 2013 IEEE International Conference on
Conference_Location :
Solan
Print_ISBN :
978-1-4673-6188-0
DOI :
10.1109/ISPCC.2013.6663413