DocumentCode
2308560
Title
A new factor for computing the relevance of a document to a query
Author
Boughanem, Mohand ; Mallak, Ihab ; Prade, Henri
Author_Institution
IRIT, Univ. of Toulouse, Toulouse, France
fYear
2010
fDate
18-23 July 2010
Firstpage
1
Lastpage
6
Abstract
In this paper we propose a method for semantic text representation and term weighting. It is based on a semantic resource, WordNet, that provides meaning information and relations between the terms of a document. The heart of the proposed method is the way the concepts (terms) of documents are clustered and weighted. More precisely, we introduce two notions: the “centrality” of a term and its specificity. The centrality of a term is given by the number of terms of the document that are directly related to it in the same conceptual cluster. The “specificity” represents the depth of a concept in WordNet. These parameters are different from the usual term frequency “tf” and inverse term frequency “idf” used in classical information retrieval. This method is based on two steps: 1) matching document terms with concepts of “WordNet” in order to obtain the most appropriate ones 2) for each concept calculating its centrality using existing semantic “WordNet” relations, and its “specificity”. The preliminary experiments undertaken on TREC collections show the effective interest of these parameters.
Keywords
pattern matching; query processing; text analysis; TREC collections; WordNet; document relevance; document terms matching; information retrieval; inverse term frequency; query processing; semantic resource; semantic text representation; specificity representation; term centrality; term weighting;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems (FUZZ), 2010 IEEE International Conference on
Conference_Location
Barcelona
ISSN
1098-7584
Print_ISBN
978-1-4244-6919-2
Type
conf
DOI
10.1109/FUZZY.2010.5584404
Filename
5584404
Link To Document