Title :
Evaluating the Quality of Web-Mined Bilingual Sentences Using Multiple Linguistic Features
Author :
Liu, Xiaohua ; Zhou, Ming
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Beijing, China
Abstract :
We raise the problem of evaluating the quality of bilingual sentences mined from the web, which is critical for such applications as statistical machine translation (SMT) and English as Second Language (ESL) learning. To tackle this problem, we propose a novel method that integrates multiple linguistic features related to spelling, grammar, and alignment, particularly the sentence type feature that indicates if a sentence can be parsed by the Link Grammar Parser (LGP). Promising results are achieved on a bilingual corpus of about 6 million English-Chinese sentences mined from the web, indicating the effectiveness of our proposed method.
Keywords :
Internet; data mining; grammars; natural languages; Web-mined bilingual sentences; english as second language; link grammar parser; multiple linguistic features; statistical machine translation; Feature extraction; Grammar; Information filters; Noise measurement; Pragmatics; Support vector machines; bilingual sentence pairs; classification; linguistic quality evaluation;
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
DOI :
10.1109/IALP.2010.12