DocumentCode
2823201
Title
English/Arabic bilingual dictionary construction using parallel texts from the Internet archive
Author
Fattah, M.A. ; Ren, Fuji ; Shingo, Kuroiwa ; Atlam, Alsayed
Author_Institution
Fac. of Eng., Tokushima Univ., Japan
Volume
2
fYear
2003
fDate
27-30 Dec. 2003
Firstpage
978
Abstract
In order to construct a good machine translation system or make any natural language processing research for cross language information retrieval you must have a good parallel corpus. The Internet archive contains a lot of parallel documents. To construct a good parallel corpus from the Internet archive, you must have a good bilingual dictionary. This paper describes an algorithm to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive. The system should preferably be useful for many different language pairs. Unlike most of the systems done, our system can extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each other and the explanation of the Arabic or English word in the other language as well. The accuracy of the system is 59.1% in the case of one English word translated to one Arabic word, 23.9% in the case of one English word translated to more than one Arabic word (Arabic phrase), and 14.6% in the case of one Arabic word translated to more than one English word (English phrase).
Keywords
Internet; computational linguistics; dictionaries; language translation; natural languages; Arabic-English translation; English-Arabic bilingual dictionary construction; English-Arabic translation; Internet archives; machine translation system; natural language processing; parallel documents; parallel texts; translation pair extraction; Automatic testing; Data mining; Dictionaries; Information filtering; Information filters; Information retrieval; Internet; Natural language processing; Natural languages; Web pages; English/Arabic translation; Multilingual dictionaries; Parallel corpora;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 2003 IEEE 46th Midwest Symposium on
ISSN
1548-3746
Print_ISBN
0-7803-8294-3
Type
conf
DOI
10.1109/MWSCAS.2003.1562450
Filename
1562450
Link To Document