DocumentCode :
3317686
Title :
Sentence alignment using hybrid model
Author :
Fattah, Mohamed Abdel ; Ren, Fuji ; Kuroiwa, Shingo
Author_Institution :
Fac. of Eng., Tokushima Univ., Japan
fYear :
2005
fDate :
30 Oct.-1 Nov. 2005
Firstpage :
388
Lastpage :
392
Abstract :
Parallel corpora have become an essential resource for work in multilingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to aligning sentences in bilingual parallel corpora based on the text character length between successive punctuates. A probabilistic score is assigned to each proposed correspondence of texts, based on the scaled difference of lengths of the two texts (in characters) and the variance of this difference. Using this score, the time required for punctuates matching decreased and the sentence alignment precision increased. Using this new approach, we could achieve 21.8% improvement over length based approach when applied on English-Arabic parallel documents.
Keywords :
language translation; linguistics; natural languages; statistical analysis; bilingual parallel corpora; cross language information retrieval; machine translation; multilingual natural language processing; sentence aligned parallel corpora; Dictionaries; Dolphins; Information retrieval; Natural language processing; Natural languages; Performance analysis; Rivers; Seals; Terminology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
Type :
conf
DOI :
10.1109/NLPKE.2005.1598768
Filename :
1598768
Link To Document :
بازگشت