DocumentCode
3490087
Title
Building an English-Vietnamese Bilingual Corpus for Machine Translation
Author
Quoc Hung Ngo ; Winiwarter, Werner
Author_Institution
Fac. of Comput. Sci., Univ. of Inf. Technol., Ho Chi Minh City, Vietnam
fYear
2012
fDate
13-15 Nov. 2012
Firstpage
157
Lastpage
160
Abstract
Bilingual corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating both example-based machine translation models and statistical machine translation models. This paper presents research on building an English-Vietnamese parallel corpus, which is constructed for building a Vietnamese-English machine translation system. We describe the specification of collecting data for the corpus, linguistic tagging, bilingual annotation, and the tools specially developed for the manual annotation. An English-Vietnamese bilingual corpus of over 800,000 sentence pairs and 10,000,000 English words as well as Vietnamese words has been collected and aligned at the sentence level, and over 45,000 sentence pairs of this corpus have been aligned at the word level.
Keywords
language translation; natural language processing; statistical analysis; text analysis; word processing; English words; English-Vietnamese bilingual corpus; English-Vietnamese parallel corpus; Vietnamese words; Vietnamese-English machine translation system; bilingual annotation; example-based machine translation models; linguistic tagging; machine translation research and development; parallel corpora; sentence level; sentence pairs; statistical machine translation models; word alignment annotation; Buildings; Conferences; Dictionaries; Educational institutions; Pragmatics; Stress; Tagging; English-Vietnamese corpus; bilingual annotation; linguistic tagging; word alignment;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location
Hanoi
Print_ISBN
978-1-4673-6113-2
Electronic_ISBN
978-0-7695-4886-9
Type
conf
DOI
10.1109/IALP.2012.30
Filename
6473720
Link To Document