مرکز منطقه ای اطلاع رساني علوم و فناوري - Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Title of article :

Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Author/Authors :

Zheng Ye1، نويسنده , , 2، نويسنده , , Jimmy Xiangji Huang1، نويسنده , , †، نويسنده , , Ben He1، نويسنده , , Hongfei Lin3، نويسنده ,

Issue Information :

ماهنامه با شماره پیاپی سال 2012

Pages :

From page :

2474

To page :

2487

Abstract :

Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary (CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval (CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality.

Keywords :

information processing , Information retrieval , Web mining

Journal title :

Journal of the American Society for Information Science and Technology

Serial Year :

2012

Journal title :

Journal of the American Society for Information Science and Technology

Record number :

994776

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=994776