DocumentCode :
1458442
Title :
k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents
Author :
Ambert, K.H. ; Cohen, A.M.
Author_Institution :
Dept. of Med. Inf. & Clinical Epidemiology, Oregon Health & Sci. Univ., Portland, OR, USA
Volume :
9
Issue :
1
fYear :
2012
Firstpage :
305
Lastpage :
310
Abstract :
Although publicly accessible databases containing protein-protein interaction (PPI)-related information are important resources to bench and in silico research scientists alike, the amount of time and effort required to keep them up to date is often burdonsome. In an effort to help identify relevant PPI publications, text-mining tools, from the machine learning discipline, can be applied to help in this process. Here, we describe and evaluate two document classification algorithms that we submitted to the BioCreative II.5 PPI Classification Challenge Task. This task asked participants to design classifiers for identifying documents containing PPI-related information in the primary literature, and evaluated them against one another. One of our systems was the overall best-performing system submitted to the challenge task. It utilizes a novel approach to k-nearest neighbor classification, which we describe here, and compare its performance to those of two support vector machine-based classification systems, one of which was also evaluated in the challenge task.
Keywords :
biology computing; document handling; learning (artificial intelligence); molecular biophysics; proteins; biocreative II.5 PPI classification challenge task; document classification algorithms; k-information gain scaled nearest neighbor; machine learning discipline; protein-protein interaction-related documents; Bioinformatics; Computational biology; Databases; Electronic mail; Proteins; Support vector machines; Training; Protein-protein interaction; information gain; k-nearest neighbor; support vector machine; text classification.; Computational Biology; Computer Simulation; Databases, Protein; Protein Interaction Maps; Reproducibility of Results; Support Vector Machines;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2011.32
Filename :
5719600
Link To Document :
بازگشت