• DocumentCode
    1114528
  • Title

    Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets

  • Author

    Gupta, Gunjan ; Liu, Alexander ; Ghosh, Joydeep

  • Author_Institution
    Amazon.com, Seattle, WA, USA
  • Volume
    7
  • Issue
    2
  • fYear
    2010
  • Firstpage
    223
  • Lastpage
    237
  • Abstract
    A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.
  • Keywords
    biology computing; genetics; molecular biophysics; 2D visualization; Gasch-Lee microarray data sets; automated clustering; automated hierarchical density shaving; fast hierarchical density-based clustering algorithm; gene-expression data sets; large biological data sets; rank individual clusters; unsupervised model selection strategy; visualization framework; Bioinformatics; Biological materials; Clustering algorithms; Data visualization; Kernel; Mass spectroscopy; Phylogeny; Proteins; Robustness; Stability criteria; Bioinformatics; Clustering; Data and knowledge visualization; Mining methods and algorithms; bioinformatics.; clustering; data and knowledge visualization; Algorithms; Cluster Analysis; Computational Biology; Data Mining; Databases, Genetic; Genes; Principal Component Analysis;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2008.32
  • Filename
    4479440