• DocumentCode
    1619023
  • Title

    Domain adaptation for statistical machine translation in development corpus selection

  • Author

    Zheng, Zhongguang ; He, Zhongjun ; Meng, Yao ; Yu, Hao

  • Author_Institution
    Fujitsu R&D Center Co., Ltd., Taiwan
  • fYear
    2010
  • Firstpage
    2
  • Lastpage
    7
  • Abstract
    The performance of statistical machine translation (SMT) system is affected by model parameters (e.g. weights of feature functions), which are usually tuned on a development corpus. Most research done to date has focused on algorithms for tuning parameters. However, the selection of development corpus is lack of discussion. It is believed that the parameters trained on a proper corpus will improve translation performance. Instead of exploring new algorithms, this paper aims to select development corpus for tuning parameters according to the test set. We address this problem as domain adaptation and propose two methods based on information retrieval (IR) technique and text clustering (TC) technique, respectively. Experimental results show that both the methods yield more stable performance for tuning parameters than subjective selection of development corpus.
  • Keywords
    information retrieval; language translation; statistical analysis; IR; SMT; TC; corpus selection development; domain adaptation; feature functions; information retrieval; model parameters; statistical machine translation; text clustering; tuning parameters; Adaptation model; Clustering methods; Feature extraction; Information retrieval; NIST; Training; Tuning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Universal Communication Symposium (IUCS), 2010 4th International
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-7821-7
  • Type

    conf

  • DOI
    10.1109/IUCS.2010.5666775
  • Filename
    5666775