DocumentCode :
3102573
Title :
Improving Chinese to English SMT with Multiple CWS Results
Author :
Ma, Yongliang ; Zhao, Tiejun
Author_Institution :
MOE-Microsoft Key Lab. of Natural Language Process. & Speech, Harbin Inst. of Technol., Harbin, China
fYear :
2009
fDate :
7-9 Dec. 2009
Firstpage :
135
Lastpage :
140
Abstract :
In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing high segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.
Keywords :
language translation; word processing; Chinese to English language translation; Chinese word segmentation; statistical machine translation; Dictionaries; Hidden Markov models; Interpolation; Laboratories; Natural language processing; Natural languages; Speech processing; Support vector machines; Surface-mount technology; White spaces; Chinese word segmentation; SMT; feature blending; feature interpolation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing, 2009. IALP '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-0-7695-3904-1
Type :
conf
DOI :
10.1109/IALP.2009.36
Filename :
5380785
Link To Document :
بازگشت