• DocumentCode
    2308560
  • Title

    A new factor for computing the relevance of a document to a query

  • Author

    Boughanem, Mohand ; Mallak, Ihab ; Prade, Henri

  • Author_Institution
    IRIT, Univ. of Toulouse, Toulouse, France
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper we propose a method for semantic text representation and term weighting. It is based on a semantic resource, WordNet, that provides meaning information and relations between the terms of a document. The heart of the proposed method is the way the concepts (terms) of documents are clustered and weighted. More precisely, we introduce two notions: the “centrality” of a term and its specificity. The centrality of a term is given by the number of terms of the document that are directly related to it in the same conceptual cluster. The “specificity” represents the depth of a concept in WordNet. These parameters are different from the usual term frequency “tf” and inverse term frequency “idf” used in classical information retrieval. This method is based on two steps: 1) matching document terms with concepts of “WordNet” in order to obtain the most appropriate ones 2) for each concept calculating its centrality using existing semantic “WordNet” relations, and its “specificity”. The preliminary experiments undertaken on TREC collections show the effective interest of these parameters.
  • Keywords
    pattern matching; query processing; text analysis; TREC collections; WordNet; document relevance; document terms matching; information retrieval; inverse term frequency; query processing; semantic resource; semantic text representation; specificity representation; term centrality; term weighting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems (FUZZ), 2010 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1098-7584
  • Print_ISBN
    978-1-4244-6919-2
  • Type

    conf

  • DOI
    10.1109/FUZZY.2010.5584404
  • Filename
    5584404