A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources

Author

Srivastava, Jaideep ; Sanyal, Subrata

Author_Institution

Inf. Technol. Indian Inst. of Inf. Technol., Allahabad, India

fYear

2012

fDate

13-15 Nov. 2012

Firstpage

185

Lastpage

188

Abstract

This paper presents an approach which improves the performance of the word alignment with scarce resources for English-Hindi language pair. We obtain an improvement in the performance of IBM Model 1-2 algorithm by applying part of speech (POS) tag prior to the computation of word alignment probability. This paper demonstrates the increase of precision, recall and F-measure by approximately 15%, 11%, 14% respectively and reduction in Alignment Error Rate (AER) by approximately 14% with IBM Model 1. Similarly it shows an increase of precision, recall and F-measure by approximately 6%, 6% and 6% respectively and reduction in Alignment Error Rate (AER) by approximately 6% with IBM Model 2. Experiments of this paper are based on TDIL corpus.

Keywords

language translation; natural language processing; probability; statistical analysis; AER; English-Hindi language pair; English-Hindi parallel corpora; IBM model 1-2 algorithm; POS; TDIL corpus; alignment error rate; increase F-measure; increase precision; increase recall; natural language processing; part-of-speech tag; performance improvement; scarce resources; statistical machine translation; word alignment probability; Computational linguistics; Computational modeling; Error analysis; Hidden Markov models; Information technology; Tagging; Training; POS tagger; Scarce resources; Statistical Machine Translation; Word alignment;

fLanguage

English

Publisher

ieee

Conference_Titel

Asian Language Processing (IALP), 2012 International Conference on

Conference_Location

Hanoi

Print_ISBN

978-1-4673-6113-2

Electronic_ISBN

978-0-7695-4886-9

Type

conf

DOI

10.1109/IALP.2012.13

Filename

6473727