Title :
Automatic detection of English words in Benglish text: A statistical approach
Author :
Kundu, Bijoy ; Chandra, Swarup
Author_Institution :
Language Technol., Centre for Dev. of Adv. Comput. (CDAC), Kolkata, India
Abstract :
Code-mixing and code-switching create challenges in the field of natural language processing applications like Machine Translation and Speech-to-Speech Translation. Detection of foreign words is very much essential for smooth processing of natural language. A statistical language independent approach for automatic detection of foreign words in mixed language has been introduced in this paper. Initially, the proposed approach has been applied on Benglish text which is combination of Bangla text contains English words. The methodology can be easily adopted for other languages where such code mixing exists. The proposed approach yields an accuracy of 71.82% when tested on sentences collected from Bangla blogs and social networking websites.
Keywords :
natural language processing; social networking (online); statistical analysis; text analysis; Bangla blogs; Bangla text; Benglish text; automatic English words detection; automatic foreign words detection; code-mixing; machine translation; mixed language; natural language processing applications; smooth natural language processing; social networking Web sites; speech-to-speech translation; statistical language independent approach; Blogs; Dictionaries; Grammar; Joints; Pragmatics; Social network services; Switches; Code Mixing; Code Switching; Foreign Inclusion; Foreign Words in Bangla; Foreign phrase fusion; Mixed Lingua;
Conference_Titel :
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4673-4367-1
DOI :
10.1109/IHCI.2012.6481827