DocumentCode
3490213
Title
A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources
Author
Srivastava, Jaideep ; Sanyal, Subrata
Author_Institution
Inf. Technol. Indian Inst. of Inf. Technol., Allahabad, India
fYear
2012
fDate
13-15 Nov. 2012
Firstpage
185
Lastpage
188
Abstract
This paper presents an approach which improves the performance of the word alignment with scarce resources for English-Hindi language pair. We obtain an improvement in the performance of IBM Model 1-2 algorithm by applying part of speech (POS) tag prior to the computation of word alignment probability. This paper demonstrates the increase of precision, recall and F-measure by approximately 15%, 11%, 14% respectively and reduction in Alignment Error Rate (AER) by approximately 14% with IBM Model 1. Similarly it shows an increase of precision, recall and F-measure by approximately 6%, 6% and 6% respectively and reduction in Alignment Error Rate (AER) by approximately 6% with IBM Model 2. Experiments of this paper are based on TDIL corpus.
Keywords
language translation; natural language processing; probability; statistical analysis; AER; English-Hindi language pair; English-Hindi parallel corpora; IBM model 1-2 algorithm; POS; TDIL corpus; alignment error rate; increase F-measure; increase precision; increase recall; natural language processing; part-of-speech tag; performance improvement; scarce resources; statistical machine translation; word alignment probability; Computational linguistics; Computational modeling; Error analysis; Hidden Markov models; Information technology; Tagging; Training; POS tagger; Scarce resources; Statistical Machine Translation; Word alignment;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location
Hanoi
Print_ISBN
978-1-4673-6113-2
Electronic_ISBN
978-0-7695-4886-9
Type
conf
DOI
10.1109/IALP.2012.13
Filename
6473727
Link To Document