DocumentCode :
1954718
Title :
Evaluating the Quality of Web-Mined Bilingual Sentences Using Multiple Linguistic Features
Author :
Liu, Xiaohua ; Zhou, Ming
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Beijing, China
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
281
Lastpage :
284
Abstract :
We raise the problem of evaluating the quality of bilingual sentences mined from the web, which is critical for such applications as statistical machine translation (SMT) and English as Second Language (ESL) learning. To tackle this problem, we propose a novel method that integrates multiple linguistic features related to spelling, grammar, and alignment, particularly the sentence type feature that indicates if a sentence can be parsed by the Link Grammar Parser (LGP). Promising results are achieved on a bilingual corpus of about 6 million English-Chinese sentences mined from the web, indicating the effectiveness of our proposed method.
Keywords :
Internet; data mining; grammars; natural languages; Web-mined bilingual sentences; english as second language; link grammar parser; multiple linguistic features; statistical machine translation; Feature extraction; Grammar; Information filters; Noise measurement; Pragmatics; Support vector machines; bilingual sentence pairs; classification; linguistic quality evaluation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.12
Filename :
5681573
Link To Document :
بازگشت