Title :
Use of direct modeling in natural language generation for Chinese and English translation
Author :
Liu, Fu-Hua ; Gao, Yuqing
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
This paper proposes a new direct-modeling-based approach to improve the maximum entropy based natural language generation (NLG) in the IBM MASTOR system, an interlingua-based speech translation system. Due to the intrinsic disparity between Chinese and English sentences, the previous method employed only linguistic constituents from output language sentences to train the NLG model. The new algorithm exploits a direct-modeling scheme to admit linguistic constituent information from both source and target languages into the training process seamlessly when incorporating a concept padding scheme. When concept sequences from the top level of semantic parse trees are considered, the concept error rate (CER) is significantly reduced to 14.3%, compared to 23.9% in the baseline NLG. Similarly, when concept sequences from all levels of semantic parse trees are tested, the direct-modeling scheme yields a CER of 10.8% compared to 17.8% in the baseline. A sensible improvement on the overall translation is made when the direct-modeling scheme improves the BLEU score from 0.252 to 0.294.
Keywords :
grammars; language translation; linguistics; maximum entropy methods; natural languages; BLEU score; CER reduction; Chinese-English translation; automatic speech-to-speech translation; concept error rate; concept padding scheme; interlingua-based speech translation system; linguistic constituent information; maximum entropy probability model; natural language generation direct modeling; semantic parse tree concept sequences; Entropy; Error analysis; Humans; Knowledge representation; Natural language processing; Natural languages; Protection; Speech recognition; Speech synthesis; Testing;
Conference_Titel :
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN :
0-7803-8678-7
DOI :
10.1109/CHINSL.2004.1409650