• DocumentCode
    29363
  • Title

    How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

  • Author

    Baya, Ariel E. ; Granitto, Pablo M.

  • Author_Institution
    French Argentine Int. Center for Inf. & Syst. Sci., UPCAM, France
  • Volume
    10
  • Issue
    2
  • fYear
    2013
  • fDate
    March-April 2013
  • Firstpage
    401
  • Lastpage
    414
  • Abstract
    Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the gap statistic. The resulting method is able to find the right number of arbitrary-shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression data sets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data.
  • Keywords
    bioinformatics; genetics; genomics; pattern clustering; statistical analysis; arbitrary-shaped cluster; artificial data set; clustering validation index; gap statistic; gene expression data set; graph concept; perturbation; statistical detection framework; Algorithm design and analysis; Bars; Clustering algorithms; Equations; Indexes; Kernel; Shape; Validation index; clustering; genomic data; Algorithms; Cluster Analysis; Computer Simulation; Databases, Genetic; Gene Expression Profiling; Genomics; Humans; Neoplasms; Reproducibility of Results;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.32
  • Filename
    6506069