DocumentCode
2402431
Title
Analysis of algorithms used to compute term discrimination values
Author
Pushpalatha, K.P. ; Raju, G.
Author_Institution
Sch. of Comput. Sci., Mahatma Gandhi Univ., Kottayam, India
fYear
2010
fDate
28-29 Dec. 2010
Firstpage
1
Lastpage
6
Abstract
Now-a-days all most all areas of life uses internet and search engines for getting relevant and useful information about various topics. Large index data bases are to be used in automatic document search and retrieval from large document collections. Term weighting schemes are very good in identifying and selecting good indexing terms. But it is possible to generate more efficient indexing terms using term discrimination values based on term weighting measures. The sum of similarity coefficients, between pairs of documents for each term, determines the document space density for a collection of documents. The terms, whose inclusion or elimination to/from documents in a collection, makes a large change in the document space density. This change constitutes the difference between the pair of documents, and in turn provides for discrimination measure. An efficient search index can be created using such good discriminating terms so that the precision and recall rates can be improved. This paper presents a study and analysis of a set of algorithms that compute and use term discrimination values (TDV) to identify good discriminators, and in turn to create good search index. It is recognized that there is a crucial relationship between term frequencies and discrimination values. Also discrimination values depend on the type of measure used to determine the similarity coefficients.
Keywords
Internet; data mining; database indexing; information retrieval; search engines; Internet; automatic document search; document collection; document pair; document retrieval; document space density; indexing term; large index database; search engine; search index; similarity coefficient; term discrimination value; Algorithm design and analysis; Approximation algorithms; Classification algorithms; Clustering algorithms; Complexity theory; Indexes; Vocabulary; TDV; Text mining; discrimination value model; search index;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on
Conference_Location
Coimbatore
Print_ISBN
978-1-4244-5965-0
Electronic_ISBN
978-1-4244-5967-4
Type
conf
DOI
10.1109/ICCIC.2010.5705844
Filename
5705844
Link To Document