DocumentCode
3101888
Title
Automatic Acquisition of Large-Scale Academic Bilingual Parallel Corpus from the Web
Author
Yong, Han ; Yu, Li ; Xiaoning, He ; Muyun, Yang ; Guohua, Lei
Author_Institution
Comput. Sci. & Technol. Dept., Heilongjiang Inst. of Technol., Harbin, China
fYear
2009
fDate
7-9 Dec. 2009
Firstpage
318
Lastpage
321
Abstract
In this paper, we describe a system which automatically acquires large-scale Chinese-English bilingual parallel corpus from China Journals Full-text Database (CJFD), a component of China National Knowledge Infrastructure (CNKI). The system gets large amount of parallel texts with domain information from the existing structured bilingual texts in CJFD, such as Chinese and English abstracts and titles of academic articles. The acquired Chinese-English parallel corpus is by several orders of magnitudes larger than similar corpus we have known before. In addition, this system collects a large amount of bilingual terms which can directly apply to lexical acquisition.
Keywords
Internet; data acquisition; linguistics; text analysis; China National Knowledge Infrastructure; Web; domain information; large-scale Chinese-English bilingual parallel corpus; lexical acquisition; parallel texts; Abstracts; Computer science; Data analysis; Data mining; Databases; Helium; Information analysis; Large-scale systems; Natural language processing; Web pages; bilingual parallel corpora acquision; bilingual term acquision; data mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing, 2009. IALP '09. International Conference on
Conference_Location
Singapore
Print_ISBN
978-0-7695-3904-1
Type
conf
DOI
10.1109/IALP.2009.75
Filename
5380757
Link To Document