مرکز منطقه ای اطلاع رساني علوم و فناوري - A Method of Construction of the Chinese and English Bilingual Translation Corpus Based on Web Data Mining

DocumentCode :

496868

Title :

A Method of Construction of the Chinese and English Bilingual Translation Corpus Based on Web Data Mining

Author :

Liu Dong-Fei ; Zhou Xing

Author_Institution :

Coll. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China

Volume :

fYear :

2009

fDate :

18-19 July 2009

Firstpage :

317

Lastpage :

319

Abstract :

The paper introduces a method of construction of the Chinese and English bilingual translation corpus based on web data mining. To collect huge amount of page data by web spider, and identify bilingual web page by a series of complicated purification and analysis process, then analysis the DOM structure of the two page text, we can get the Chinese and English parallel translation corpus and save them to database. As the corpus accumulated by machine automatically, it has higher efficiency, and the translate content come from the internet, original resource is rich and accurate relatively, it can provide a good reference data for translation software.

Keywords :

data mining; language translation; natural language processing; search engines; Chinese bilingual translation corpus; Chinese parallel translation corpus; English bilingual translation corpus; English parallel translation corpus; Web data mining; Web spider; bilingual Web page; software translation; Computer science; Data mining; Databases; Educational institutions; Electronic mail; Indexes; Information processing; Internet; Paper technology; Search engines; Bilingual Identification; Search Engine; Translation; Web Data Mining;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on

Conference_Location :

Shenzhen

Print_ISBN :

978-0-7695-3699-6

Type :

conf

DOI :

10.1109/APCIP.2009.87

Filename :

5197060

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=496868