Title :
Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach
Author :
Huang, Chengbang ; Morcos, Faruck ; Kanaan, Simon P. ; Wuchty, Stefan ; Chen, Danny Z. ; Izaguirre, Jesús A.
Author_Institution :
Dept. of Comput. Sci., Notre Dame Univ., IN
Abstract :
One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, maximum specificity set cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the maximum likelihood estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://ppi-cse.nd.edu
Keywords :
biochemistry; biology computing; cellular biophysics; enzymes; greedy algorithms; maximum likelihood estimation; molecular biophysics; RNA polymerase II; S. cerevisiae; cellular protein interactions; domain interactions; greedy algorithm; maximum likelihood estimation; maximum specificity set cover; molecular systems; proteasome; protein coexpression; protein domains; protein functional homogeneity; protein-protein interactions; proteome; Bioinformatics; Concrete; Genomics; Greedy algorithms; Large-scale systems; Maximum likelihood estimation; Organisms; Proteins; RNA; Sequences; Computations on discrete structures; bioinformatics (genome or protein) databases; biology; genetics.; graph algorithms; Algorithms; Computational Biology; Databases, Protein; Gene Expression Profiling; Internet; Likelihood Functions; Models, Statistical; Oligonucleotide Array Sequence Analysis; Proteasome Endopeptidase Complex; Protein Interaction Mapping; Protein Structure, Tertiary; Proteins; Proteomics; RNA Polymerase II; Saccharomyces cerevisiae Proteins;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2007.1001