DocumentCode :
1229721
Title :
On the Importance of Comprehensible Classification Models for Protein Function Prediction
Author :
Freitas, Alex A. ; Wieser, Daniela C. ; Apweiler, Rolf
Author_Institution :
Comput. Lab., Univ. of Kent, Canterbury, UK
Volume :
7
Issue :
1
fYear :
2010
Firstpage :
172
Lastpage :
182
Abstract :
The literature on protein function prediction is currently dominated by works aimed at maximizing predictive accuracy, ignoring the important issues of validation and interpretation of discovered knowledge, which can lead to new insights and hypotheses that are biologically meaningful and advance the understanding of protein functions by biologists. The overall goal of this paper is to critically evaluate this approach, offering a refreshing new perspective on this issue, focusing not only on predictive accuracy but also on the comprehensibility of the induced protein function prediction models. More specifically, this paper aims to offer two main contributions to the area of protein function prediction. First, it presents the case for discovering comprehensible protein function prediction models from data, discussing in detail the advantages of such models, namely, increasing the confidence of the biologist in the system´s predictions, leading to new insights about the data and the formulation of new biological hypotheses, and detecting errors in the data. Second, it presents a critical review of the pros and cons of several different knowledge representations that can be used in order to support the discovery of comprehensible protein function prediction models.
Keywords :
bioinformatics; classification; knowledge representation; molecular biophysics; physiological models; proteins; comprehensible classification models; data error detection; knowledge representation; protein function prediction; review; Biology; Classifier design and evaluation; Induction; Machine learning; classifier design and evaluation; induction; machine learning.; Algorithms; Amino Acid Sequence; Computer Simulation; Models, Biological; Models, Chemical; Molecular Sequence Data; Pattern Recognition, Automated; Proteins; Sequence Analysis, Protein; Structure-Activity Relationship;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2008.47
Filename :
4527204
Link To Document :
بازگشت