DocumentCode :
83349
Title :
Constructing Query-Driven Dynamic Machine Learning Model With Application to Protein-Ligand Binding Sites Prediction
Author :
Dong-Jun Yu ; Jun Hu ; Qian-Mu Li ; Zhen-Min Tang ; Jing-Yu Yang ; Hong-Bin Shen
Author_Institution :
Sch. of Comput. Sci. & Eng., Nanjing Univ. of Sci. & Technol., Nanjing, China
Volume :
14
Issue :
1
fYear :
2015
fDate :
Jan. 2015
Firstpage :
45
Lastpage :
58
Abstract :
We are facing an era with annotated biological data rapidly and continuously generated. How to effectively incorporate new annotated data into the learning step is crucial for enhancing the performance of a bioinformatics prediction model. Although machine-learning-based methods have been extensively used for dealing with various biological problems, existing approaches usually train static prediction models based on fixed training datasets. The static approaches are found having several disadvantages such as low scalability and impractical when training dataset is huge. In view of this, we propose a dynamic learning framework for constructing query-driven prediction models. The key difference between the proposed framework and the existing approaches is that the training set for the machine learning algorithm of the proposed framework is dynamically generated according to the query input, as opposed to training a general model regardless of queries in traditional static methods. Accordingly, a query-driven predictor based on the smaller set of data specifically selected from the entire annotated base dataset will be applied on the query. The new way for constructing the dynamic model enables us capable of updating the annotated base dataset flexibly and using the most relevant core subset as the training set makes the constructed model having better generalization ability on the query, showing “part could be better than all” phenomenon. According to the new framework, we have implemented a dynamic protein-ligand binding sites predictor called OSML (On-site model for ligand binding sites prediction). Computer experiments on 10 different ligand types of three hierarchically organized levels show that OSML outperforms most existing predictors. The results indicate that the current dynamic framework is a promising future direction for bridging the gap between the rapidly accumulated annotated biological data and the effective machine-learning-based pre- ictors. OSML web server and datasets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/OSML/ for academic use.
Keywords :
Internet; biochemistry; bioinformatics; data mining; information retrieval systems; learning (artificial intelligence); proteins; question answering (information retrieval); OSML dataset; OSML prediction; OSML predictor; OSML web server; annotated base dataset; annotated biological data; annotated dataset base; bioinformatics prediction model; dataset selection; dataset training; dynamic learning framework; dynamic machine learning model; dynamic model construction; dynamic protein-ligand binding site predictor; fixed training dataset; ligand-type computer experiment; machine learning algorithm; machine learning model construction; machine learning-based predictor; machine-learning-based method; on-site model for ligand binding site dataset; on-site model for ligand binding site prediction; on-site model for ligand binding site predictor; on-site model for ligand binding site web server; protein-ligand binding site prediction; query-driven machine learning model; query-driven prediction model construction; query-driven predictor; static approach disadvantage; static approach scalability; static prediction model; traditional static method; Biological system modeling; Data models; Feature extraction; Predictive models; Proteins; Training; Dynamic learning framework; OSML; machine learning; query-driven prediction model;
fLanguage :
English
Journal_Title :
NanoBioscience, IEEE Transactions on
Publisher :
ieee
ISSN :
1536-1241
Type :
jour
DOI :
10.1109/TNB.2015.2394328
Filename :
7051329
Link To Document :
بازگشت