Title :
Mining sequence features for DNA-binding site prediction
Author :
Hu, Jing ; Yan, Changhui
Author_Institution :
Dept. of Comput. Sci., Utah State Univ., Logan, UT
Abstract :
Protein-DNA interactions play pivotal roles in gene regulation and DNA replication and repair. Since the 3-dimensional structure of most proteins is still unknown, computational methods which can identify DNA-binding sites from protein sequences are in demand. In this study, we used a greedy method to search for features that are useful for the identification of DNA-binding sites. 5 features were selected from a pool of 534 features. Using the selected 5 features, a Naive Bayes method achieved 0.31 Matthews correlation coefficient (MCC), which is an improvement over a previous method that used only 2 features as input. Since all of the 5 features can be derived from protein sequences, the proposed method can identify DNA-binding sites using only protein sequences as input.
Keywords :
DNA; biology computing; data mining; genetics; molecular biophysics; 3-dimensional structure; DNA repair; DNA replication; DNA-binding site prediction; Matthews correlation coefficient; Naive Bayes method; gene regulation; mining sequence; protein-DNA interactions; proteins; Amino acids; DNA computing; Electrostatics; Entropy; Genetics; Neural networks; Proteins; Sequences; Shape; Solvents;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2008. CIBCB '08. IEEE Symposium on
Conference_Location :
Sun Valley, ID
Print_ISBN :
978-1-4244-1778-0
Electronic_ISBN :
978-1-4244-1779-7
DOI :
10.1109/CIBCB.2008.4675782