DocumentCode
1619023
Title
Domain adaptation for statistical machine translation in development corpus selection
Author
Zheng, Zhongguang ; He, Zhongjun ; Meng, Yao ; Yu, Hao
Author_Institution
Fujitsu R&D Center Co., Ltd., Taiwan
fYear
2010
Firstpage
2
Lastpage
7
Abstract
The performance of statistical machine translation (SMT) system is affected by model parameters (e.g. weights of feature functions), which are usually tuned on a development corpus. Most research done to date has focused on algorithms for tuning parameters. However, the selection of development corpus is lack of discussion. It is believed that the parameters trained on a proper corpus will improve translation performance. Instead of exploring new algorithms, this paper aims to select development corpus for tuning parameters according to the test set. We address this problem as domain adaptation and propose two methods based on information retrieval (IR) technique and text clustering (TC) technique, respectively. Experimental results show that both the methods yield more stable performance for tuning parameters than subjective selection of development corpus.
Keywords
information retrieval; language translation; statistical analysis; IR; SMT; TC; corpus selection development; domain adaptation; feature functions; information retrieval; model parameters; statistical machine translation; text clustering; tuning parameters; Adaptation model; Clustering methods; Feature extraction; Information retrieval; NIST; Training; Tuning;
fLanguage
English
Publisher
ieee
Conference_Titel
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location
Beijing
Print_ISBN
978-1-4244-7821-7
Type
conf
DOI
10.1109/IUCS.2010.5666775
Filename
5666775
Link To Document