Title of article
Means and variances for a family of similarity indices used in cluster analysis
Author/Authors
Albatineh، نويسنده , , Ahmed N.، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2010
Pages
11
From page
2828
To page
2838
Abstract
Albatineh et al. (2006) introduced a family L of similarity indices. Members of this family are linear functions of the matching counts matrix [mij], where mij is the number of common elements between the i th and j th clusters resulting from two clusterings of the same data set. Fowlkes and Mallows (1983) derived the mean and variance for Rand (1971) index and an index they called Bk (which is actually attributed to Ochiai, 1957) under fixed marginal totals of the matching counts matrix and independence of the clustering algorithms. This paper generalizes the derivation of Fowlkes and Mallows (1983) for the mean and variance to any member of the L family which makes the problem of comparison of a wide family of indices much easier. Monte Carlo simulations are implemented to compare shapes, means and variances for nine members of the L family for null case data (without clustering structure). Structured case simulations are implemented to evaluate the nine indices as tools for measuring cluster structure recovery. Data were generated from bivariate normal distributions.
Keywords
Similarity index , clustering algorithm , Rand index , Matching counts , Cluster analysis
Journal title
Journal of Statistical Planning and Inference
Serial Year
2010
Journal title
Journal of Statistical Planning and Inference
Record number
2220902
Link To Document