Title :
Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets
Author :
Gupta, Gunjan ; Liu, Alexander ; Ghosh, Joydeep
Author_Institution :
Amazon.com, Seattle, WA, USA
Abstract :
A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.
Keywords :
biology computing; genetics; molecular biophysics; 2D visualization; Gasch-Lee microarray data sets; automated clustering; automated hierarchical density shaving; fast hierarchical density-based clustering algorithm; gene-expression data sets; large biological data sets; rank individual clusters; unsupervised model selection strategy; visualization framework; Bioinformatics; Biological materials; Clustering algorithms; Data visualization; Kernel; Mass spectroscopy; Phylogeny; Proteins; Robustness; Stability criteria; Bioinformatics; Clustering; Data and knowledge visualization; Mining methods and algorithms; bioinformatics.; clustering; data and knowledge visualization; Algorithms; Cluster Analysis; Computational Biology; Data Mining; Databases, Genetic; Genes; Principal Component Analysis;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2008.32