Title :
Clustering based on multiple biological information: approach for predicting protein complexes
Author :
Xiwei Tang ; Qilong Feng ; Jianxin Wang ; Yiming He ; Yi Pan
Author_Institution :
Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
Abstract :
Protein complexes are a cornerstone of many biological processes. Protein-protein interaction (PPI) data enable a number of computational methods for predicting protein complexes. However, the insufficiency of the PPI data significantly lowers the accuracy of computational methods. In the current work, the authors develop a novel method named clustering based on multiple biological information (CMBI) to discover protein complexes via the integration of multiple biological resources including gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient and the Pearson correlation coefficient. Second, CMBI selects essential proteins as seeds to build the protein complexes. A redundancy-filtering procedure is performed to eliminate redundant complexes. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, the authors compare the complexes discovered by CMBI with the ones found by other techniques by matching the predicted complexes against the reference complexes. The authors use subsequently GO::TermFinder to analyse the complexes predicted by various methods. Finally, the effect of parameters T and R is investigated. The results from GO functional enrichment and matching analyses show that CMBI performs significantly better than the state-of-the-art methods.
Keywords :
bioinformatics; correlation methods; filtering theory; genetics; genomics; molecular biophysics; pattern clustering; proteins; GO functional enrichment; GO-TermFinder; Pearson correlation coefficient; biological processes; computational methods; edge-clustering coefficient; gene expression profiles; matching analysis; multiple biological information clustering; multiple biological resources; protein complexes; protein-protein interaction; redundancy-filtering procedure;
Journal_Title :
Systems Biology, IET
DOI :
10.1049/iet-syb.2012.0052