Title :
Semiautomatic acquisition of translation templates from monolingual unannotated corpora
Author :
Hu, Rile ; Zong, Chengqing ; Xu, Bo
Author_Institution :
Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China
Abstract :
We propose a new approach which can semiautomatically acquire translation templates from the unannotated Chinese spoken language corpora in the domain of travel information accessing. In the approach, we introduce two elements into the unsupervised agglomerative clustering method, which are called extended contexts and cohesion degree. With these two elements, the similarity and the degree of cohesion of two entities in the corpora may be described more exactly. In our approach the semantic and phrasal structures are firstly acquired from the unannotated corpus, and then the translation templates are manually built based on the semantic and phrasal structures. The preliminary results of our experiment show that the approach can get a higher performance than the method merely using the local contexts and mutual information.
Keywords :
grammars; language translation; linguistics; natural languages; Chinese spoken language corpora; cohesion degree; monolingual unannotated corpora; phrasal structure; semantic structure; semiautomatic acquisition; translation template; travel information access; unsupervised agglomerative clustering method; Automation; Clustering methods; Costs; Encoding; Laboratories; Large-scale systems; Mutual information; Natural languages; Pattern recognition; Surface-mount technology;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275888