Title of article :
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval
Author/Authors :
Zheng Ye1، نويسنده , , 2، نويسنده , ,
Jimmy Xiangji Huang1، نويسنده , , †، نويسنده , ,
Ben He1، نويسنده , ,
Hongfei Lin3، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2012
Abstract :
Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary (CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval (CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality.
Keywords :
information processing , Information retrieval , Web mining
Journal title :
Journal of the American Society for Information Science and Technology
Journal title :
Journal of the American Society for Information Science and Technology