DocumentCode
2193050
Title
Automatic Construction of Web-Based English/Chinese Parallel Corpora
Author
Tan Bin ; Zhou Xu-yan
Author_Institution
Dept. of Comput., Jingganshan Univ., Ji´an, China
fYear
2010
fDate
2-4 April 2010
Firstpage
114
Lastpage
117
Abstract
As the demand for global information increases significantly, multilingual corpora has become a valuable linguistic resource for applications to cross-lingual information retrieval and natural language processing. A Web-based English-Chinese bilingual parallel corpus of automatic Construction Technology solved the shortage of bilingual English-Chinese Parallel Corpus. First, some web pages which may be set translation dig of from a particular source, and then from the web pages focused on the external characteristics according to the similarity to extract the candidate web pages in parallel pairs, use of content-based methods on parallel web pages for each of these candidates assessed. In the assessment of the candidate pairs of parallel web pages, this paper design ECVS models of bilingual text similarity assessed based on the classic vector space model.
Keywords
Internet; content-based retrieval; natural language processing; English-Chinese parallel corpora; Web pages; Web-based parallel corpora; automatic construction technology; bilingual text similarity; content-based methods; cross-lingual information retrieval; multilingual corpora; natural language processing; vector space model; Informatics; Information security; Information technology; Jacobi correlation coefficient; Parallel corpora; vector space;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on
Conference_Location
Jinggangshan
Print_ISBN
978-1-4244-6730-3
Electronic_ISBN
978-1-4244-6743-3
Type
conf
DOI
10.1109/IITSI.2010.124
Filename
5453637
Link To Document