Title :
Extracting historical terms based on aligned Chinese-English parallel corpora
Author :
Xiuying Li ; Chao Che ; Limin Han ; Xiaoxia Liu
Author_Institution :
Dalian Univ. of Technol., Dalian, China
Abstract :
This paper examines the feasibility of implementing statistic-oriented term extraction and evaluation methods in extracting historical terms from aligned parallel corpora of Chinese historical classics and their translations. It proposes to take transliteration as anchor points to establish sentence-level alignment. It also investigates the approach to extract term translation pairs based on 4000 parallel sentences or segments of sentences from the corpora of the Chinese historical classic Shi Ji (Records of the Historian) and its English translations by two well-known translators. The experimental results indicate that the statistically sound algorithm can successfully extract those terms whose English translations are consistent throughout the corpus and those transliterated pairs, but fails to extract the translations of those terms that are translated differently by the two translators although the translations may be equally qualified in terms of their usage in the English language. The algorithm also fails to extract the top frequency terms which are ambiguous in meaning due to changes of its part of speech. Therefore, this paper suggests insights gained from the linguistic and translation studies perspectives can be integrated with the statistic measurements to improve the extraction and validating results.
Keywords :
computational linguistics; information retrieval; language translation; natural language processing; text analysis; Chinese historical classic Shi Ji; Chinese-English parallel corpora; English language; English translations; historical terms; linguistic; sentence-level alignment; statistic-oriented term evaluation; statistic-oriented term extraction; Channel hot electron injection; Chaos; Data mining; Frequency; Gain measurement; History; Natural language processing; Natural languages; Statistics; Terminology; Chinese historical classics; Historical term; extraction; parallel corpora;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
DOI :
10.1109/NLPKE.2009.5313766