• DocumentCode
    3074429
  • Title

    A Novel Approach for Automatic Number of Clusters Detection in Microarray Data Based on Consensus Clustering

  • Author

    Vinh, Nguyen Xuan ; Epps, Julien

  • Author_Institution
    Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Sydney, NSW, Australia
  • fYear
    2009
  • fDate
    22-24 June 2009
  • Firstpage
    84
  • Lastpage
    91
  • Abstract
    Estimating the true number of clusters in a data set is one of the major challenges in cluster analysis. Yet in certain domains,knowing the true number of clusters is of high importance. For example, in medical research, detecting the true number of groups and sub-groups of cancer would be of utmost importance for their effective treatment. In this paper we propose a novel method to estimate the number of clusters in a micro array data set based on the consensus clustering approach. Although the main objective of consensus clustering is to discover a robust and high quality cluster structure in a data set, closer inspection of the set of clusterings obtained can often give valuable information about the appropriate number of clusters present. More specifically, the set off clusterings obtained when the specified number of clusters coincides with the true number of clusters tends to be less diverse.To quantify this diversity we develop a novel index, namely the Consensus Index (CI), which is built upon a suitable clustering similarity measure such as the well known Adjusted Rand Index (ARI)or our recently developed, information theoretic based index, namely the Adjusted Mutual Information (AMI). Our experiments on both synthetic and real microarray data sets indicate that the CI is a useful indicator for determining the appropriate number of clusters.
  • Keywords
    information theory; pattern clustering; adjusted mutual information; adjusted rand index; automatic number; cluster analysis; clustering similarity measure; clusters detection; consensus clustering approach; consensus index; high quality cluster structure; information theoretic based index; microarray data set; robust cluster structure; Bioinformatics; Biomedical engineering; Cancer detection; Clustering algorithms; Clustering methods; Inspection; Medical treatment; Mutual information; Robustness; Shape; adjusted mutual information (AMI); gene clustering; model selection; number of cluster detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and BioEngineering, 2009. BIBE '09. Ninth IEEE International Conference on
  • Conference_Location
    Taichung
  • Print_ISBN
    978-0-7695-3656-9
  • Type

    conf

  • DOI
    10.1109/BIBE.2009.19
  • Filename
    5211310