• DocumentCode
    259565
  • Title

    Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment

  • Author

    Mohaghegh, Mahsa ; Sarrafzadeh, Hossein ; Mohammadi, Mehdi

  • Author_Institution
    Dept. of Comput., Unitec, Auckland, New Zealand
  • fYear
    2014
  • fDate
    3-6 Dec. 2014
  • Firstpage
    61
  • Lastpage
    66
  • Abstract
    Statistical word alignment models need large amounts of training data while they are weak in small-sized corpora. This paper proposes a new approach of an unsupervised hybrid word alignment technique using an ensemble learning method. This algorithm uses three base alignment models in several rounds to generate alignments. The ensemble algorithm uses a weighed scheme for resampling training data and a voting score to consider aggregated alignments. The underlying alignment algorithms used in this study include IBM Model 1, 2 and a heuristic method based on Dice measurement. Our experimental results show that by this approach, the alignment error rate could be improved by at least 15% for the base alignment models.
  • Keywords
    statistical analysis; text analysis; unsupervised learning; word processing; IBM Model; dice measurement; ensemble learning method; small-sized corpora; statistical word alignment model; underlying alignment algorithms; unsupervised hybrid word alignment technique; Boosting; Computational modeling; Error analysis; Hidden Markov models; Mathematical model; Training; Training data; ensemble learning; heuristic word alignment; statistical word alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2014 13th International Conference on
  • Conference_Location
    Detroit, MI
  • Type

    conf

  • DOI
    10.1109/ICMLA.2014.15
  • Filename
    7033092