Title :
A new factor for computing the relevance of a document to a query
Author :
Boughanem, Mohand ; Mallak, Ihab ; Prade, Henri
Author_Institution :
IRIT, Univ. of Toulouse, Toulouse, France
Abstract :
In this paper we propose a method for semantic text representation and term weighting. It is based on a semantic resource, WordNet, that provides meaning information and relations between the terms of a document. The heart of the proposed method is the way the concepts (terms) of documents are clustered and weighted. More precisely, we introduce two notions: the “centrality” of a term and its specificity. The centrality of a term is given by the number of terms of the document that are directly related to it in the same conceptual cluster. The “specificity” represents the depth of a concept in WordNet. These parameters are different from the usual term frequency “tf” and inverse term frequency “idf” used in classical information retrieval. This method is based on two steps: 1) matching document terms with concepts of “WordNet” in order to obtain the most appropriate ones 2) for each concept calculating its centrality using existing semantic “WordNet” relations, and its “specificity”. The preliminary experiments undertaken on TREC collections show the effective interest of these parameters.
Keywords :
pattern matching; query processing; text analysis; TREC collections; WordNet; document relevance; document terms matching; information retrieval; inverse term frequency; query processing; semantic resource; semantic text representation; specificity representation; term centrality; term weighting;
Conference_Titel :
Fuzzy Systems (FUZZ), 2010 IEEE International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6919-2
DOI :
10.1109/FUZZY.2010.5584404