DocumentCode :
2434242
Title :
Language identification using Fuzzy-SVM technique
Author :
Mishra, Girish ; Nitharwal, Sohan Lal ; Kaur, Sarvjeet
Author_Institution :
Sci. Anal. Group, Defence R&D Organ., Delhi, India
fYear :
2010
fDate :
29-31 July 2010
Firstpage :
1
Lastpage :
5
Abstract :
Language Identification is an important issue in today´s multilingual world. In this paper we have analyzed Fuzzy-SVM technique for identification of romanized plaintexts of five Indian regional languages namely Hindi, Bangla, Manipuri, Urdu and Kashmiri. Distinguishing features/characteristics have been extracted from romanized plaintexts of each of these five languages and represented suitably through Fuzzy Sets on a normalized scale. These normalized feature vectors are given as input to the Support Vector Machine (SVM) based classifier. For constructing the hyperplane in a higher dimension space Guassian Radial Basis Kernal function has been used. The proposed Pattern Recognition (PR) system is independent of the dictionaries of these languages and can even identify plaintext with unknown word boundaries. This PR system (Language Identifier) can be used for automatic segregation of plain texts of these languages while analyzing intercepted, multiplexed and interleaved Speech/Data/Fax communication. The proposed method significantly improves the classification accuracy compared to the other methods even for smaller text length messages.
Keywords :
feature extraction; fuzzy set theory; natural language processing; pattern classification; radial basis function networks; support vector machines; text analysis; Bangla; Guassian radial basis Kernal function; Hindi; Indian regional language; Kashmiri; Manipuri; PR system; Urdu; feature extraction; feature vector; fuzzy SVM technique; fuzzy set; interleaved data communication; language identification; language identifier; language plaintext; pattern recognition system; plain text segregation; support vector machine based classifier; Classification algorithms; Feature extraction; Fuzzy sets; Kernel; Support vector machine classification; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing Communication and Networking Technologies (ICCCNT), 2010 International Conference on
Conference_Location :
Karur
Print_ISBN :
978-1-4244-6591-0
Type :
conf
DOI :
10.1109/ICCCNT.2010.5592553
Filename :
5592553
Link To Document :
بازگشت