DocumentCode :
2839417
Title :
Use of direct modeling in natural language generation for Chinese and English translation
Author :
Liu, Fu-Hua ; Gao, Yuqing
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2004
fDate :
15-18 Dec. 2004
Firstpage :
317
Lastpage :
320
Abstract :
This paper proposes a new direct-modeling-based approach to improve the maximum entropy based natural language generation (NLG) in the IBM MASTOR system, an interlingua-based speech translation system. Due to the intrinsic disparity between Chinese and English sentences, the previous method employed only linguistic constituents from output language sentences to train the NLG model. The new algorithm exploits a direct-modeling scheme to admit linguistic constituent information from both source and target languages into the training process seamlessly when incorporating a concept padding scheme. When concept sequences from the top level of semantic parse trees are considered, the concept error rate (CER) is significantly reduced to 14.3%, compared to 23.9% in the baseline NLG. Similarly, when concept sequences from all levels of semantic parse trees are tested, the direct-modeling scheme yields a CER of 10.8% compared to 17.8% in the baseline. A sensible improvement on the overall translation is made when the direct-modeling scheme improves the BLEU score from 0.252 to 0.294.
Keywords :
grammars; language translation; linguistics; maximum entropy methods; natural languages; BLEU score; CER reduction; Chinese-English translation; automatic speech-to-speech translation; concept error rate; concept padding scheme; interlingua-based speech translation system; linguistic constituent information; maximum entropy probability model; natural language generation direct modeling; semantic parse tree concept sequences; Entropy; Error analysis; Humans; Knowledge representation; Natural language processing; Natural languages; Protection; Speech recognition; Speech synthesis; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN :
0-7803-8678-7
Type :
conf
DOI :
10.1109/CHINSL.2004.1409650
Filename :
1409650
Link To Document :
بازگشت