DocumentCode :
3101776
Title :
Learning Method for Extraction of Partial Correspondence from Parallel Corpus
Author :
Terashima, Ryo ; Echizen-ya, Hiroshi ; Araki, Kenji
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan
fYear :
2009
fDate :
7-9 Dec. 2009
Firstpage :
293
Lastpage :
298
Abstract :
For machine translations using a parallel corpus, it is effective to extract partial correspondences: pairs of phrases of the source language(SL) and target language(TL) in bilingual sentences. However, it is difficult to extract the partial correspondences correctly and efficiently in the data sparse corpus. In this paper, we propose a new learning method that extracts the partial correspondences solely from the parallel corpus without any analytical tools. In the proposed method, the extraction rules are automatically acquired from bilingual sentences using bi-gram statistics in each language sentence and the similarity based on Dice coefficient between SL words and TL words. The acquired extraction rules possess information about the first parts(e.g., "a", "the") or the last parts in phrases. Moreover, the partial correspondences are extracted from the bilingual sentences using the extraction rules correctly and efficiently. Evaluation experiments indicated that our proposed method can improve the translation quality of the learning-type machine translation by correctly and efficiently extracting the partial correspondences in bilingual sentences.
Keywords :
information retrieval; language translation; learning (artificial intelligence); text analysis; Dice coefficient; bi-gram statistics; bilingual sentences; data sparse corpus; learning-type machine translation; parallel corpus; partial correspondence extraction rule; source language; target language; translation quality; Data mining; Dictionaries; Feedback; Frequency; Information science; Learning systems; Machine learning; Natural languages; Statistics; Surface-mount technology; extraction rule; learning; machine translation; parallel corpus; partial correspondence;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing, 2009. IALP '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-0-7695-3904-1
Type :
conf
DOI :
10.1109/IALP.2009.69
Filename :
5380752
Link To Document :
بازگشت