Title :
Ensemble Statistical and Heuristic Models for Unsupervised Word Alignment
Author :
Mohaghegh, Mahsa ; Sarrafzadeh, Hossein ; Mohammadi, Mehdi
Author_Institution :
Dept. of Comput., Unitec, Auckland, New Zealand
Abstract :
Statistical word alignment models need large amounts of training data while they are weak in small-sized corpora. This paper proposes a new approach of an unsupervised hybrid word alignment technique using an ensemble learning method. This algorithm uses three base alignment models in several rounds to generate alignments. The ensemble algorithm uses a weighed scheme for resampling training data and a voting score to consider aggregated alignments. The underlying alignment algorithms used in this study include IBM Model 1, 2 and a heuristic method based on Dice measurement. Our experimental results show that by this approach, the alignment error rate could be improved by at least 15% for the base alignment models.
Keywords :
statistical analysis; text analysis; unsupervised learning; word processing; IBM Model; dice measurement; ensemble learning method; small-sized corpora; statistical word alignment model; underlying alignment algorithms; unsupervised hybrid word alignment technique; Boosting; Computational modeling; Error analysis; Hidden Markov models; Mathematical model; Training; Training data; ensemble learning; heuristic word alignment; statistical word alignment;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2014 13th International Conference on
Conference_Location :
Detroit, MI
DOI :
10.1109/ICMLA.2014.15