• Title of article

    Means and variances for a family of similarity indices used in cluster analysis

  • Author/Authors

    Albatineh، نويسنده , , Ahmed N.، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2010
  • Pages
    11
  • From page
    2828
  • To page
    2838
  • Abstract
    Albatineh et al. (2006) introduced a family L of similarity indices. Members of this family are linear functions of the matching counts matrix [mij], where mij is the number of common elements between the i th and j th clusters resulting from two clusterings of the same data set. Fowlkes and Mallows (1983) derived the mean and variance for Rand (1971) index and an index they called Bk (which is actually attributed to Ochiai, 1957) under fixed marginal totals of the matching counts matrix and independence of the clustering algorithms. This paper generalizes the derivation of Fowlkes and Mallows (1983) for the mean and variance to any member of the L family which makes the problem of comparison of a wide family of indices much easier. Monte Carlo simulations are implemented to compare shapes, means and variances for nine members of the L family for null case data (without clustering structure). Structured case simulations are implemented to evaluate the nine indices as tools for measuring cluster structure recovery. Data were generated from bivariate normal distributions.
  • Keywords
    Similarity index , clustering algorithm , Rand index , Matching counts , Cluster analysis
  • Journal title
    Journal of Statistical Planning and Inference
  • Serial Year
    2010
  • Journal title
    Journal of Statistical Planning and Inference
  • Record number

    2220902