DocumentCode
1606485
Title
Automatic detection of English words in Benglish text: A statistical approach
Author
Kundu, Bijoy ; Chandra, Swarup
Author_Institution
Language Technol., Centre for Dev. of Adv. Comput. (CDAC), Kolkata, India
fYear
2012
Firstpage
1
Lastpage
4
Abstract
Code-mixing and code-switching create challenges in the field of natural language processing applications like Machine Translation and Speech-to-Speech Translation. Detection of foreign words is very much essential for smooth processing of natural language. A statistical language independent approach for automatic detection of foreign words in mixed language has been introduced in this paper. Initially, the proposed approach has been applied on Benglish text which is combination of Bangla text contains English words. The methodology can be easily adopted for other languages where such code mixing exists. The proposed approach yields an accuracy of 71.82% when tested on sentences collected from Bangla blogs and social networking websites.
Keywords
natural language processing; social networking (online); statistical analysis; text analysis; Bangla blogs; Bangla text; Benglish text; automatic English words detection; automatic foreign words detection; code-mixing; machine translation; mixed language; natural language processing applications; smooth natural language processing; social networking Web sites; speech-to-speech translation; statistical language independent approach; Blogs; Dictionaries; Grammar; Joints; Pragmatics; Social network services; Switches; Code Mixing; Code Switching; Foreign Inclusion; Foreign Words in Bangla; Foreign phrase fusion; Mixed Lingua;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location
Kharagpur
Print_ISBN
978-1-4673-4367-1
Type
conf
DOI
10.1109/IHCI.2012.6481827
Filename
6481827
Link To Document