Title of article :
Automatic Generation of Japanese–English Bilingual
Thesauri Based on Bilingual Corpora
Author/Authors :
Keita Tsuji and Kyo Kageura، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2006
Abstract :
The authors propose a method for automatically generating
Japanese–English bilingual thesauri based on bilingual
corpora. The term bilingual thesaurus refers to a set
of bilingual equivalent words and their synonyms. Most
of the methods proposed so far for extracting bilingual
equivalent word clusters from bilingual corpora depend
heavily on word frequency and are not effective for dealing
with low-frequency clusters. These low-frequency
bilingual clusters are worth extracting because they contain
many newly coined terms that are in demand but are
not listed in existing bilingual thesauri. Assuming that
single language-pair-independent methods such as
frequency-based ones have reached their limitations and
that a language-pair-dependent method used in combination
with other methods shows promise, the authors
propose the following approach: (a) Extract translation
pairs based on transliteration patterns; (b) remove the
pairs from among the candidate words; (c) extract translation
pairs based on word frequency from the remaining
candidate words; and (d) generate bilingual clusters
based on the extracted pairs using a graph-theoretic
method. The proposed method has been found to be
significantly more effective than other methods.
Journal title :
Journal of the American Society for Information Science and Technology
Journal title :
Journal of the American Society for Information Science and Technology