DocumentCode
178471
Title
On Validation of Clustering Techniques for Bibliographic Databases
Author
Mishra, S. ; Saha, S. ; Mondal, S.
Author_Institution
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Patna, Patna, India
fYear
2014
fDate
24-28 Aug. 2014
Firstpage
3150
Lastpage
3155
Abstract
In entity name disambiguation, performance evaluation of any approach is difficult. This is due to the fact that correct or actual results are often not known. Generally for evaluation purpose, three measures namely precision, recall and f-measure are used. They all are external validity indices because they need golden standard data. But in Bibliographic databases like DBLP, Arnetminer, Scopus, Web of Science, Google Scholar, etc., gold standard data is not easily available and it is very difficult to obtain this due to the overlapping nature of data. So, there is a need to use some other matrices for evaluation purpose. In this paper, some internal cluster validity index based schemes are proposed for evaluating entity name disambiguation algorithms when applied on bibliographic data without using any gold standard datasets. Two new internal validity indices are also proposed in the current paper for this purpose. Experimental results shown on seven bibliographic datasets reveal that proposed internal cluster validity indices are able to compare the results obtained by different methods without prior/gold standard. Thus the present paper demonstrates a novel way of evaluating any entity matching algorithm for bibliographic datasets without using any prior/gold standard information.
Keywords
bibliographic systems; database management systems; pattern clustering; DBLP; Scopus; Web-of-science; arnetminer; bibliographic databases; clustering technique validation; disambiguation algorithms; entity matching algorithm; external validity indices; f-measure; google scholar; performance evaluation; Clustering algorithms; Equations; Gold; Indexes; Information services; Mathematical model; Standards;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location
Stockholm
ISSN
1051-4651
Type
conf
DOI
10.1109/ICPR.2014.543
Filename
6977255
Link To Document