• شماره ركورد كنفرانس
    3540
  • عنوان مقاله

    Word-Level Confidence Estimation for Statistical Machine Translation using IBM-1 Model

  • Author/Authors
    Mohammad Mahdi Mahsuli Human Language Technology Lab - Department of Computer Engineering and Information Technology - Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran , Shahram Khadivi Human Language Technology Lab - Department of Computer Engineering and Information Technology - Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran
  • كليدواژه
    translation error rate , IBM-1 model , machine translation , confidence measure , confidence estimation , natural language processing
  • سال انتشار
    1392
  • عنوان كنفرانس
    همايش بين المللي هوش مصنوعي و پردازش سيگنال
  • زبان مدرك
    لاتين
  • چكيده لاتين
    Confidence estimation for machine translation is a method for label-ing each word in a machine translation system‟s output as “correct” or “incor-rect”. In this paper, we will present new confidence measures based on IBM-1 model which have the advantage that unlike many other confidence measures, they do not rely on system output such as N-best lists or word graphs. In addi-tion, they are very low-cost to calculate. Therefore these confidence measures are applicable to any kind of machine translation system. Experiments have been performed on translation of news lines in English-Farsi language pair. The performance of the new confidence measures is better than similar existing con-fidence measures. Moreover, we will introduce a method to tag unlabeled train-ing samples. This method - which has given promising results in machine trans-lation, but not yet used in confidence estimation - is called translation error rate.
  • كشور
    ايران
  • تعداد صفحه 2
    9
  • از صفحه
    1
  • تا صفحه
    9