• DocumentCode
    2402431
  • Title

    Analysis of algorithms used to compute term discrimination values

  • Author

    Pushpalatha, K.P. ; Raju, G.

  • Author_Institution
    Sch. of Comput. Sci., Mahatma Gandhi Univ., Kottayam, India
  • fYear
    2010
  • fDate
    28-29 Dec. 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Now-a-days all most all areas of life uses internet and search engines for getting relevant and useful information about various topics. Large index data bases are to be used in automatic document search and retrieval from large document collections. Term weighting schemes are very good in identifying and selecting good indexing terms. But it is possible to generate more efficient indexing terms using term discrimination values based on term weighting measures. The sum of similarity coefficients, between pairs of documents for each term, determines the document space density for a collection of documents. The terms, whose inclusion or elimination to/from documents in a collection, makes a large change in the document space density. This change constitutes the difference between the pair of documents, and in turn provides for discrimination measure. An efficient search index can be created using such good discriminating terms so that the precision and recall rates can be improved. This paper presents a study and analysis of a set of algorithms that compute and use term discrimination values (TDV) to identify good discriminators, and in turn to create good search index. It is recognized that there is a crucial relationship between term frequencies and discrimination values. Also discrimination values depend on the type of measure used to determine the similarity coefficients.
  • Keywords
    Internet; data mining; database indexing; information retrieval; search engines; Internet; automatic document search; document collection; document pair; document retrieval; document space density; indexing term; large index database; search engine; search index; similarity coefficient; term discrimination value; Algorithm design and analysis; Approximation algorithms; Classification algorithms; Clustering algorithms; Complexity theory; Indexes; Vocabulary; TDV; Text mining; discrimination value model; search index;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on
  • Conference_Location
    Coimbatore
  • Print_ISBN
    978-1-4244-5965-0
  • Electronic_ISBN
    978-1-4244-5967-4
  • Type

    conf

  • DOI
    10.1109/ICCIC.2010.5705844
  • Filename
    5705844