• DocumentCode
    126872
  • Title

    A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections

  • Author

    Ibrahim, Osman A. S. ; Landa-Silva, Dario

  • Author_Institution
    ASAP Res. Group, Univ. of Nottingham, Nottingham, UK
  • fYear
    2014
  • fDate
    8-10 Sept. 2014
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    This paper introduces a new weighting scheme in information retrieval. It also proposes using the document centroid as a threshold for normalizing documents in a document collection. Document centroid normalization helps to achieve more effective information retrieval as it enables good discrimination between documents. In the context of a machine learning application, namely unsupervised document indexing and retrieval, we compared the effectiveness of the proposed weighting scheme to the `Term Frequency - Inverse Document Frequency´ or TF-IDF, which is commonly used and considered as one of the best existing weighting schemes. The paper shows how the document centroid is used to remove less significant weights from documents and how this helps to achieve better retrieval effectiveness. Most of the existing weighting schemes in information retrieval research assume that the whole document collection is static. The results presented in this paper show that the proposed weighting scheme can produce higher retrieval effectiveness compared with the TF-IDF weighting scheme, in both static and dynamic document collections. The results also show the variation in information retrieval effectiveness that is achieved for static and dynamic document collections by using a specific weighting scheme. This type of comparison has not been presented in the literature before.
  • Keywords
    document handling; indexing; information retrieval; learning (artificial intelligence); TF-IDF weighting scheme; document centroid normalization; dynamic document collections; information retrieval effectiveness; information retrieval research; machine learning application; static document collections; term frequency-inverse document frequency; unsupervised document indexing; Computer science; Educational institutions; Feature extraction; Indexes; Information retrieval; Training; Vectors; TF-IDF; document centroid; dynamic document collection; static document collection; term discrimination; term weight;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence (UKCI), 2014 14th UK Workshop on
  • Conference_Location
    Bradford
  • Type

    conf

  • DOI
    10.1109/UKCI.2014.6930160
  • Filename
    6930160