• DocumentCode
    1167435
  • Title

    A Novel Approach for Discovering Overlapping Clusters in Gene Expression Data

  • Author

    Ma, P.C.H. ; Chan, Keith C C

  • Author_Institution
    Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong
  • Volume
    56
  • Issue
    7
  • fYear
    2009
  • fDate
    7/1/2009 12:00:00 AM
  • Firstpage
    1803
  • Lastpage
    1809
  • Abstract
    Many existing clustering algorithms have been used to identify coexpressed genes in gene expression data. These algorithms are used mainly to partition data in the sense that each gene is allowed to belong only to one cluster. Since proteins typically interact with different groups of proteins in order to serve different biological roles, the genes that produce these proteins are therefore expected to coexpress with more than one group of genes. In other words, some genes are expected to belong to more than one cluster. This poses a challenge to gene expression data clustering as there is a need for overlapping clusters to be discovered in a noisy environment. For this task, we propose an effective information theoretical approach, which consists of an initial clustering phase and a second reclustering phase, in this paper. The proposed approach has been tested with both simulated and real expression data. Experimental results show that it can improve the performances of existing clustering algorithms and is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered.
  • Keywords
    bioinformatics; genetics; molecular biophysics; proteins; bioinformatics; gene expression data; noisy environment; overlapping clustering algorithm; proteins interaction; reclustering phase analysis; Biological information theory; Clustering algorithms; Data mining; Gene expression; Genetic communication; Information theory; Partitioning algorithms; Proteins; RNA; Testing; Working environment noise; Bioinformatics; data mining; gene expression data clustering; information theory; Algorithms; Cluster Analysis; Computer Simulation; Databases, Genetic; Gene Expression; Gene Expression Profiling; Information Theory; Models, Genetic; Models, Statistical; Yeasts;
  • fLanguage
    English
  • Journal_Title
    Biomedical Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9294
  • Type

    jour

  • DOI
    10.1109/TBME.2009.2015055
  • Filename
    4785521