Title :
Generating english-persian parallel corpus using an automatic anchor finding sentence aligner
Author :
Yazdchi, Meisam Vosoughpour ; Faili, Heshaam
Author_Institution :
Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
Abstract :
The more we can enlarge a parallel bilingual corpus, the more we have made it effective and powerful. Providing such corpora demands special efforts both in seeking for as much already translated texts as possible and also in designing appropriate sentence alignment algorithms with as less time complexity as possible. In this paper, we propose algorithms for sentence aligning of two Persian-English texts in linear time complexity and with a surprisingly high accuracy. This linear time-complexity is achieved through our new language-independent anchor finding algorithm which enables us to align as a big parallel text as a whole book in a single attempt and with a high accuracy. As far as we know, this project is the first automatic construction of an English-Persian parallel sentence-level corpus.
Keywords :
computational complexity; natural language processing; text analysis; English-Persian parallel corpus; automatic anchor finding sentence aligner; linear time complexity; sentence alignment algorithms;
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
DOI :
10.1109/NLPKE.2010.5587769