DocumentCode
714034
Title
A machine learning approach to identify DNA replication proteins from sequence-derived features
Author
Runtao Yang ; Chengjin Zhang ; Rui Gao ; Lina Zhang
Author_Institution
Sch. of Control Sci. & Eng., Shandong Univ., Jinan, China
fYear
2015
fDate
3-6 May 2015
Firstpage
13
Lastpage
18
Abstract
DNA replication, a critical step in cell division and proliferation, is a process of producing two identical replicas from one original DNA molecule. Although great advances have been made in DNA replication research, the detailed mechanism of DNA replication is still unresolved. Faithful DNA replication requires the cooperation of many proteins. Failures in DNA replication leave mutations in the genome, which can cause cancers and other diseases. Therefore, accurately identifying these important DNA replication proteins may assist in understanding the molecular mechanisms of DNA replication and drug development. As the experimental methods are expensive and labor intensive, it is highly desired to develop an accurate computational method for identifying DNA replication proteins. In this paper, a machine learning approach to identify DNA replication proteins has been developed using a Naïve Bayes classifier and sequence-derived features. The prediction performance of features extracted from the Reduced Amino Acid Composition (RAAC) and two Pseudo Amino Acid Composition (PseAAC) models is investigated, respectively. Prediction results indicate that the PseAAC (type 2) model yields the best performance. Then, based on the PseAAC (type 2) model, we compare our method with the similarity search method on the independent test dataset. The comparison results reveal that it is feasible to identify DNA replication proteins by machine learning algorithms. The proposed method may provide candidate DNA replication proteins for future experimental verification to assist in understanding the molecular mechanisms of DNA replication and drug development for the treatment of human diseases.
Keywords
Bayes methods; DNA; biology computing; drugs; genetics; learning (artificial intelligence); proteins; DNA molecule; DNA replication protein; cell division; drug development; genome; machine learning; molecular mechanism; naive Bayes classifier; pseudo amino acid composition; reduced amino acid composition; sequence-derived feature; Accuracy; Amino acids; DNA; Diseases; Feature extraction; Proteins; Sensitivity;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on
Conference_Location
Halifax, NS
ISSN
0840-7789
Print_ISBN
978-1-4799-5827-6
Type
conf
DOI
10.1109/CCECE.2015.7129092
Filename
7129092
Link To Document