• DocumentCode
    3318290
  • Title

    A maximum entropy-based sentence simplifier for machine translation

  • Author

    Finch, Andrew ; Shimohata, Mitsuo ; Sumita, Eiichiro

  • Author_Institution
    ATR Spoken Language Translation Res. Labs., Kyoto, Japan
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    646
  • Lastpage
    650
  • Abstract
    We present a method for removing unnecessary words from sentences to expedite automatic machine translation. Our hypothesis is that the resulting simplified sentences are easier to automatically translate, giving improved translation performance. We evaluate the sentence simplifier in two ways. Firstly the system is tested directly against humans in the word deletion task. The output of our system is evaluated against a set of reference sentences and its performance compared to a test set of human-shortened sentences. We show the system is able to perform at close to human performance on this task. Secondly we evaluate the system when used as a preprocessor to two different machine translation systems. We show that we are able to significantly improve the performance of a machine translation (MT) system based on the publicly available GIZA++ software by pre-processing the input, and make a small improvement to the performance of the more capable ATR translation system.
  • Keywords
    language translation; maximum entropy methods; natural languages; ATR translation system; GIZA++ software; automatic machine translation system; maximum entropy-based sentence simplifier; reference sentences; word deletion task; Cities and towns; Entropy; History; Humans; Laboratories; Natural language processing; Natural languages; Software performance; Speech; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598816
  • Filename
    1598816