DocumentCode :
172501
Title :
One-expression classification in Bengali and its role in Bengali-English machine translation
Author :
Senapati, Apurbalal ; Garain, U.
Author_Institution :
Indian Stat. Inst., Kolkata, India
fYear :
2014
fDate :
20-22 Oct. 2014
Firstpage :
162
Lastpage :
165
Abstract :
This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors´ generated annotated dataset containing 2006 instances of one-expressions. The classifier´s performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.
Keywords :
language translation; natural language processing; pattern classification; Bengali-English machine translation; CRF-based classifier; one-expression classification; Accuracy; Context; Fires; Google; Natural language processing; Random access memory; Training data; Bengali; corpus; machine translation; one-expressions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location :
Kuching
Type :
conf
DOI :
10.1109/IALP.2014.6973489
Filename :
6973489
Link To Document :
بازگشت