DocumentCode :
178468
Title :
Microenvironment-Based Protein Function Analysis by Random Forest
Author :
Okada, K. ; Flores, L. ; Wong, M. ; Petkovic, D.
Author_Institution :
Comput. Sci. Dept., San Francisco State Univ. San Francisco, San Francisco, CA, USA
fYear :
2014
fDate :
24-28 Aug. 2014
Firstpage :
3138
Lastpage :
3143
Abstract :
Machine learning-based prediction of protein functions plays a key role in bioinformatics and pharmaceutical research, facilitating swift discovery of new drugs in high-throughput settings. This paper presents an adaptation of Random Forest to the structure-based protein function prediction. Our system represents protein´s 3D physicochemical structural information in microenvironment descriptors whose spatial resolution is much finer than other sequence-based protein descriptors. We prepare our datasets for seven active sites from five protein function classes by using multiple public data banks and train Random Forest classifiers to identify these seven function models in proteins. This paper presents two experiment studies: 1) a 5-fold stratified cross-validation for comparing Random Forest with Naive Bayes and Support Vector Machine and 2) systematic comparison of Random Forest´s two variable importance measures. Promising results of these studies demonstrate a potential for Random Forest to improve the accuracy of the current protein function assays.
Keywords :
Bayes methods; bioinformatics; learning (artificial intelligence); pattern classification; proteins; support vector machines; 5-fold stratified cross-validation; bioinformatics; machine learning-based prediction; microenvironment-based protein function analysis; multiple public data banks; naive Bayes; pharmaceutical research; random forest classifiers; structure-based protein function prediction; support vector machine; Accuracy; Computational modeling; Niobium; Proteins; Radio frequency; Support vector machines; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
ISSN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2014.541
Filename :
6977253
Link To Document :
بازگشت