DocumentCode :
3410835
Title :
Large scale testing of chemical shift prediction algorithms and improved machine learning-based approaches to shift prediction
Author :
Arun, K. ; Langmead, Christopher James
Author_Institution :
Dept. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
2004
fDate :
16-19 Aug. 2004
Firstpage :
712
Lastpage :
713
Abstract :
The resonant frequencies, or chemical shifts, of nuclear magnetic resonance (NMR) active nuclei in proteins are determined by covalent and through-space interactions and, more generally, the electronic environment surrounding each nucleus. However, the precise nature of the correlation between protein three-dimensional (3D) structure and chemical shift remains largely unsolved. Thus, chemical shift prediction is a non-trivial task. This study tests the accuracy of three existing structure-based chemical shift prediction algorithms (SHIFTS, SHIFTX, PROSHIFT) against REFDB, a large database of experimentally determined, and manually re-referenced 1H, 13C, and 15N chemical shifts. We report that the accuracy of backbone chemical shift predictions for each program is lower than that originally reported. This suggests these programs over-fit the data used in their construction. We then compare two novel methods for chemical shift prediction based on support vector machines (SVM) and bagging respectively. Each method was trained on REFUB using predictions made by SHIFTS, SHIFTX, and PROSHIFT as features. In cross-validated experiments, bagging is shown to be superior to SVMs, while both methods are substantially better than SHIFTS, SHIFTX, and PROSHIFT. Our results suggest that meta-methods for chemical shift prediction yield increased accuracy for chemical shift prediction.
Keywords :
biological NMR; biology computing; chemical shift; learning (artificial intelligence); molecular biophysics; proteins; support vector machines; PROSHIFT; REFDB; SHIFTS; SHIFTX; active nuclei; bagging; chemical shift prediction algorithms; machine learning; nuclear magnetic resonance; protein three-dimensional structure; resonant frequencies; support vector machines; Bagging; Chemicals; Large-scale systems; Machine learning algorithms; Nuclear magnetic resonance; Prediction algorithms; Proteins; Resonant frequency; Support vector machines; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
Type :
conf
DOI :
10.1109/CSB.2004.1332556
Filename :
1332556
Link To Document :
بازگشت