• DocumentCode
    1016360
  • Title

    Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering

  • Author

    Cho, Hyuk ; Dhillon, Inderjit S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX
  • Volume
    5
  • Issue
    3
  • fYear
    2008
  • Firstpage
    385
  • Lastpage
    400
  • Abstract
    It is a consensus in microarray analysis that identifying potential local patterns, characterized by coherent groups of genes and conditions, may shed light on the discovery of previously undetectable biological cellular processes of genes, as well as macroscopic phenotypes of related samples. In orderto simultaneously cluster genes and conditions, we have previously developed a fast coclustering algorithm, minimum sum-squared residue coclustering (MSSRCC), which employs an alternating minimization scheme and generates what we call coclusters in a "checkerboard" structure. In this paper, we propose specific strategies that enable MSSRCC to escape poor local minima and resolve the degeneracy problem in partitional clustering algorithms. The strategies include binormalization, deterministic spectral initialization, and incremental local search. We assess the effects of various strategies on both synthetic gene expression data sets and real human cancer microarrays and provide empirical evidence that MSSRCC with the proposed strategies performs better than existing coclustering and clustering algorithms. In particular, the combination of all the three strategies leads to the best performance. Furthermore, we illustrate coherence of the resulting coclusters in a checkerboard structure, where genes in a cocluster manifest the phenotype structure of corresponding specific samples and evaluate the enrichment of functional annotations in gene ontology (GO).
  • Keywords
    cancer; cellular biophysics; genetics; medical computing; pattern clustering; tumours; binormalization; biological cellular processes; checkerboard structure; cluster genes; deterministic spectral initialization; gene ontology; human cancer microarrays; incremental local search; macroscopic phenotypes; microarray analysis; minimization scheme; minimum sum-squared residue coclustering; partitional clustering algorithms; synthetic gene expression; Gene Ontology; binormalization; co-clustering; deterministic spectral initialization; local search; microarray analysis; Algorithms; Cluster Analysis; Gene Expression Profiling; Humans; Least-Squares Analysis; Neoplasm Proteins; Neoplasms; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Tumor Markers, Biological;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.70268
  • Filename
    4407679