Title :
Sequence-based Protein-Protein Interaction Prediction Optimized for Target Selection in Biological Experiments
Author :
Ye, Jiankuan ; Kulikowski, Casimir ; Muchnik, Ilya
Author_Institution :
Dept. of Comput. Sci., Rutgers Univ., Piscataway, NJ
fDate :
6/27/1905 12:00:00 AM
Abstract :
A set of protein pairs predicted to be interacting with high ratio of true positive is valuable for target selection in experiments like protein structure determination. Our goal in this paper is to investigate the problem of finding such a set of protein pairs in an organism by machine learning methods. Yeast genome was taken for this study and support vector machine was adopted as the classification model. Domain information of each protein was extracted and transformed into features of a protein pair. We specifically analyzed the effect of negative sample selection based on different principles. We also evaluated the feasibility to adjust the intercept parameter of a trained SVM model to improve the ratio of predicted true positive. Our result shows that the approximate 1:3 ratio of positive samples to negative ones in the testing data can be significantly improved to 2:1 of the positive to negative in the predicted data
Keywords :
biochemistry; biology computing; learning (artificial intelligence); molecular biophysics; molecular configurations; proteins; support vector machines; machine learning; negative sample selection; sequence-based protein-protein interaction prediction; support vector machine; target selection; yeast genome; Bioinformatics; Data mining; Fungi; Genomics; Learning systems; Organisms; Predictive models; Proteins; Support vector machine classification; Support vector machines;
Conference_Titel :
Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the
Conference_Location :
Shanghai
Print_ISBN :
0-7803-8741-4
DOI :
10.1109/IEMBS.2005.1616387