DocumentCode
477923
Title
A Research on Length Based Sentence Alignment for Chinese-English Parallel Corpus
Author
Zan, Hongying ; Zhang, Xia ; Fan, Ming
Author_Institution
Coll. of Inf. & Eng., Zhengzhou Univ., Zhengzhou
Volume
4
fYear
2008
fDate
18-20 Oct. 2008
Firstpage
145
Lastpage
149
Abstract
Many existing length based Chinese-English sentence alignment methods compute sentence length in terms of the number of bytes. In this paper, we examine the effectiveness of six different ways of sentence length computation, which take, respectively, the number of verbs, nouns, adjectives, content words, bytes and all words in a sentence as its length. Most previous methods are found memory consuming and inefficient. This paper proposes an alignment method to save memory and time via grouping sentence for alignment. Our experimental results show that taking all words into account in the sentence length computation can further enhance alignment performance, giving 99.01% precision and 99.5% recall, respectively.
Keywords
natural language processing; Chinese-English parallel corpus; length based sentence alignment; Concurrent computing; Dictionaries; Educational institutions; Fuzzy systems; Heuristic algorithms; Knowledge engineering; Large scale integration; Natural languages; Performance analysis; Terminology; NLP; parallel corpus; sentence alignment;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location
Jinan Shandong
Print_ISBN
978-0-7695-3305-6
Type
conf
DOI
10.1109/FSKD.2008.307
Filename
4666373
Link To Document