Title :
A Method of Construction of the Chinese and English Bilingual Translation Corpus Based on Web Data Mining
Author :
Liu Dong-Fei ; Zhou Xing
Author_Institution :
Coll. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China
Abstract :
The paper introduces a method of construction of the Chinese and English bilingual translation corpus based on web data mining. To collect huge amount of page data by web spider, and identify bilingual web page by a series of complicated purification and analysis process, then analysis the DOM structure of the two page text, we can get the Chinese and English parallel translation corpus and save them to database. As the corpus accumulated by machine automatically, it has higher efficiency, and the translate content come from the internet, original resource is rich and accurate relatively, it can provide a good reference data for translation software.
Keywords :
data mining; language translation; natural language processing; search engines; Chinese bilingual translation corpus; Chinese parallel translation corpus; English bilingual translation corpus; English parallel translation corpus; Web data mining; Web spider; bilingual Web page; software translation; Computer science; Data mining; Databases; Educational institutions; Electronic mail; Indexes; Information processing; Internet; Paper technology; Search engines; Bilingual Identification; Search Engine; Translation; Web Data Mining;
Conference_Titel :
Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-0-7695-3699-6
DOI :
10.1109/APCIP.2009.87