DocumentCode
3251704
Title
Adaptive parallel sentences mining from web bilingual news collection
Author
Zhao, Bing ; Vogel, Stephan
Author_Institution
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear
2002
fDate
2002
Firstpage
745
Lastpage
748
Abstract
In this paper a robust, adaptive approach for mining parallel sentences from a bilingual comparable news collection is described Sentence length models and lexicon-based models are combined under a maximum likelihood criterion. Specific models are proposed to handle insertions and deletions that are frequent in bilingual data collected from the web. The proposed approach is adaptive, updating the translation lexicon iteratively using the mined parallel data to get better vocabulary coverage and translation probability parameter estimation. Experiments are carried out on 10 years of Xinhua bilingual news collection. Using the mined data, we get significant improvement in word-to-word alignment accuracy in machine translation modeling.
Keywords
data mining; dynamic programming; language translation; maximum likelihood estimation; Web bilingual news collection; Xinhua bilingual news collection; adaptive approach; adaptive parallel sentences mining; lexicon-based models; machine translation modeling; maximum likelihood criterion; mined parallel data; sentence length models; translation probability parameter estimation; vocabulary coverage; Computer science; Information retrieval; Maximum likelihood estimation; Natural language processing; Natural languages; Parameter estimation; Probability; Robustness; Vocabulary; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN
0-7695-1754-4
Type
conf
DOI
10.1109/ICDM.2002.1184044
Filename
1184044
Link To Document