Title :
Analysis of paraphrased corpus and lexical-based approach to Chinese paraphrasing
Author :
Zhang, Yan ; Kashioka, Hideki
Author_Institution :
ATR Spoken Language Translation Res. Labs., Kyoto, Japan
Abstract :
We firstly analyze the language phenomena and distribution characteristics of Chinese spontaneous utterances already paraphrased by other approaches. Based on the information obtained from a corpus, our lexical-based approach is proposed to paraphrase Chinese spoken language. Our purpose is to transform various expressions into simplified expressions with the same meanings. Chinese verbs are the main constituents in sentences, and with their flexibility they play an important role in expressing structures, especially for transitive verbs. Furthermore, negative verb expressions also appear frequently to express enquiries in question utterances. Therefore, we design four types of paraphrasing templates based on lexical information and the characteristics of the corpus: (1) synonym replacement; (2) Chinese transitive verbs; (3) verbs with two objects; (4) the transformation of negative expressions. Our experiment found that the lexical-based approach is effective for Chinese paraphrasing.
Keywords :
linguistics; natural languages; speech processing; speech recognition; Chinese paraphrasing; Chinese spontaneous utterances; Chinese transitive verbs; language distribution characteristics; language phenomena; lexical-based approach; negative verb expressions; paraphrased corpus; paraphrasing templates; simplified expressions; spoken language translation; synonym replacement; transitive verbs; Cities and towns; Databases; Information analysis; Laboratories; Natural languages; Tagging;
Conference_Titel :
Chinese Spoken Language Processing, 2004 International Symposium on
Print_ISBN :
0-7803-8678-7
DOI :
10.1109/CHINSL.2004.1409652