DocumentCode
1167435
Title
A Novel Approach for Discovering Overlapping Clusters in Gene Expression Data
Author
Ma, P.C.H. ; Chan, Keith C C
Author_Institution
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong
Volume
56
Issue
7
fYear
2009
fDate
7/1/2009 12:00:00 AM
Firstpage
1803
Lastpage
1809
Abstract
Many existing clustering algorithms have been used to identify coexpressed genes in gene expression data. These algorithms are used mainly to partition data in the sense that each gene is allowed to belong only to one cluster. Since proteins typically interact with different groups of proteins in order to serve different biological roles, the genes that produce these proteins are therefore expected to coexpress with more than one group of genes. In other words, some genes are expected to belong to more than one cluster. This poses a challenge to gene expression data clustering as there is a need for overlapping clusters to be discovered in a noisy environment. For this task, we propose an effective information theoretical approach, which consists of an initial clustering phase and a second reclustering phase, in this paper. The proposed approach has been tested with both simulated and real expression data. Experimental results show that it can improve the performances of existing clustering algorithms and is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered.
Keywords
bioinformatics; genetics; molecular biophysics; proteins; bioinformatics; gene expression data; noisy environment; overlapping clustering algorithm; proteins interaction; reclustering phase analysis; Biological information theory; Clustering algorithms; Data mining; Gene expression; Genetic communication; Information theory; Partitioning algorithms; Proteins; RNA; Testing; Working environment noise; Bioinformatics; data mining; gene expression data clustering; information theory; Algorithms; Cluster Analysis; Computer Simulation; Databases, Genetic; Gene Expression; Gene Expression Profiling; Information Theory; Models, Genetic; Models, Statistical; Yeasts;
fLanguage
English
Journal_Title
Biomedical Engineering, IEEE Transactions on
Publisher
ieee
ISSN
0018-9294
Type
jour
DOI
10.1109/TBME.2009.2015055
Filename
4785521
Link To Document