• DocumentCode
    3490213
  • Title

    A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources

  • Author

    Srivastava, Jaideep ; Sanyal, Subrata

  • Author_Institution
    Inf. Technol. Indian Inst. of Inf. Technol., Allahabad, India
  • fYear
    2012
  • fDate
    13-15 Nov. 2012
  • Firstpage
    185
  • Lastpage
    188
  • Abstract
    This paper presents an approach which improves the performance of the word alignment with scarce resources for English-Hindi language pair. We obtain an improvement in the performance of IBM Model 1-2 algorithm by applying part of speech (POS) tag prior to the computation of word alignment probability. This paper demonstrates the increase of precision, recall and F-measure by approximately 15%, 11%, 14% respectively and reduction in Alignment Error Rate (AER) by approximately 14% with IBM Model 1. Similarly it shows an increase of precision, recall and F-measure by approximately 6%, 6% and 6% respectively and reduction in Alignment Error Rate (AER) by approximately 6% with IBM Model 2. Experiments of this paper are based on TDIL corpus.
  • Keywords
    language translation; natural language processing; probability; statistical analysis; AER; English-Hindi language pair; English-Hindi parallel corpora; IBM model 1-2 algorithm; POS; TDIL corpus; alignment error rate; increase F-measure; increase precision; increase recall; natural language processing; part-of-speech tag; performance improvement; scarce resources; statistical machine translation; word alignment probability; Computational linguistics; Computational modeling; Error analysis; Hidden Markov models; Information technology; Tagging; Training; POS tagger; Scarce resources; Statistical Machine Translation; Word alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2012 International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4673-6113-2
  • Electronic_ISBN
    978-0-7695-4886-9
  • Type

    conf

  • DOI
    10.1109/IALP.2012.13
  • Filename
    6473727