DocumentCode :
496868
Title :
A Method of Construction of the Chinese and English Bilingual Translation Corpus Based on Web Data Mining
Author :
Liu Dong-Fei ; Zhou Xing
Author_Institution :
Coll. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China
Volume :
1
fYear :
2009
fDate :
18-19 July 2009
Firstpage :
317
Lastpage :
319
Abstract :
The paper introduces a method of construction of the Chinese and English bilingual translation corpus based on web data mining. To collect huge amount of page data by web spider, and identify bilingual web page by a series of complicated purification and analysis process, then analysis the DOM structure of the two page text, we can get the Chinese and English parallel translation corpus and save them to database. As the corpus accumulated by machine automatically, it has higher efficiency, and the translate content come from the internet, original resource is rich and accurate relatively, it can provide a good reference data for translation software.
Keywords :
data mining; language translation; natural language processing; search engines; Chinese bilingual translation corpus; Chinese parallel translation corpus; English bilingual translation corpus; English parallel translation corpus; Web data mining; Web spider; bilingual Web page; software translation; Computer science; Data mining; Databases; Educational institutions; Electronic mail; Indexes; Information processing; Internet; Paper technology; Search engines; Bilingual Identification; Search Engine; Translation; Web Data Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-0-7695-3699-6
Type :
conf
DOI :
10.1109/APCIP.2009.87
Filename :
5197060
Link To Document :
بازگشت