Towards improving the performance of language identification system for Indian languages

Author

Anto, Abitha ; Sreekumar, K.T. ; Kumar, C. Santhosh ; Raj, P. C. Reghu

Author_Institution

Dept. of Comput. Sci. & Eng, Gov. Eng. Coll., Palakkad, India

fYear

2014

fDate

17-18 Dec. 2014

Firstpage

42

Lastpage

46

Abstract

In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kan-nada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio files from YouTube videos and removing the non-speech signals manually. The system was tested using a test data set consisting of 40 utterances with duration of 30, 10, and 3 sees, in each of 5 target languages. The performance evaluation was done separately accordingly to the NIST benchmarking sessions, for 30s, 10s and 3s segments separately. For the baseline system, we got an overall EER of 10.41 %, 19.56 % and 31.45 % for 30, 10, and 3 sees segments when tested with a 3-gram language model. The use of 4-gram language model has helped enhance the performance of the LID system to 9.81 %, 19.38 % and 32.77% respectively for 30,10 and 3 sees test segments. Further, by using the n-gram smoothing, we were able to improve the EER of the LID system, 9.02 %, 18.70 % and 29.24 % for 3-gram language models and 8.88 %, 16.46 % and 32.03 % for 4-gram language models, respectively for 30,10, and 3 sec test segments. The study shows that the use of 4-gram language models can help enhance the performance of LID systems for Indian languages.

Keywords

natural language processing; speech processing; 3-gram language model; 4-gram language model; English language; Hindi language; Indian languages; Kannada language; LID system; Malayalam language; NIST benchmarking; Tamil language; YouTube videos; audio file downloading; baseline system; language identification system; n-gram smoothing; nonspeech signal removal; overall EER improvement; performance improvement; phonotactic language identification system; target languages; test data set; test segments; utterance duration; Acoustics; Computational modeling; Data models; Databases; Smoothing methods; Speech; Speech recognition; Language Model; Phone Recognition followed by Language Modeling (PRLM); Phonotactic features; n-gram;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Systems and Communications (ICCSC), 2014 First International Conference on

Conference_Location

Trivandrum

Print_ISBN

978-1-4799-6012-5

Type

conf

DOI

10.1109/COMPSC.2014.7032618

Filename

7032618