Automatic detection of English words in Benglish text: A statistical approach

Author

Kundu, Bijoy ; Chandra, Swarup

Author_Institution

Language Technol., Centre for Dev. of Adv. Comput. (CDAC), Kolkata, India

fYear

2012

Firstpage

1

Lastpage

4

Abstract

Code-mixing and code-switching create challenges in the field of natural language processing applications like Machine Translation and Speech-to-Speech Translation. Detection of foreign words is very much essential for smooth processing of natural language. A statistical language independent approach for automatic detection of foreign words in mixed language has been introduced in this paper. Initially, the proposed approach has been applied on Benglish text which is combination of Bangla text contains English words. The methodology can be easily adopted for other languages where such code mixing exists. The proposed approach yields an accuracy of 71.82% when tested on sentences collected from Bangla blogs and social networking websites.

Keywords

natural language processing; social networking (online); statistical analysis; text analysis; Bangla blogs; Bangla text; Benglish text; automatic English words detection; automatic foreign words detection; code-mixing; machine translation; mixed language; natural language processing applications; smooth natural language processing; social networking Web sites; speech-to-speech translation; statistical language independent approach; Blogs; Dictionaries; Grammar; Joints; Pragmatics; Social network services; Switches; Code Mixing; Code Switching; Foreign Inclusion; Foreign Words in Bangla; Foreign phrase fusion; Mixed Lingua;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on

Conference_Location

Kharagpur

Print_ISBN

978-1-4673-4367-1

Type

conf

DOI

10.1109/IHCI.2012.6481827

Filename

6481827