• DocumentCode
    714034
  • Title

    A machine learning approach to identify DNA replication proteins from sequence-derived features

  • Author

    Runtao Yang ; Chengjin Zhang ; Rui Gao ; Lina Zhang

  • Author_Institution
    Sch. of Control Sci. & Eng., Shandong Univ., Jinan, China
  • fYear
    2015
  • fDate
    3-6 May 2015
  • Firstpage
    13
  • Lastpage
    18
  • Abstract
    DNA replication, a critical step in cell division and proliferation, is a process of producing two identical replicas from one original DNA molecule. Although great advances have been made in DNA replication research, the detailed mechanism of DNA replication is still unresolved. Faithful DNA replication requires the cooperation of many proteins. Failures in DNA replication leave mutations in the genome, which can cause cancers and other diseases. Therefore, accurately identifying these important DNA replication proteins may assist in understanding the molecular mechanisms of DNA replication and drug development. As the experimental methods are expensive and labor intensive, it is highly desired to develop an accurate computational method for identifying DNA replication proteins. In this paper, a machine learning approach to identify DNA replication proteins has been developed using a Naïve Bayes classifier and sequence-derived features. The prediction performance of features extracted from the Reduced Amino Acid Composition (RAAC) and two Pseudo Amino Acid Composition (PseAAC) models is investigated, respectively. Prediction results indicate that the PseAAC (type 2) model yields the best performance. Then, based on the PseAAC (type 2) model, we compare our method with the similarity search method on the independent test dataset. The comparison results reveal that it is feasible to identify DNA replication proteins by machine learning algorithms. The proposed method may provide candidate DNA replication proteins for future experimental verification to assist in understanding the molecular mechanisms of DNA replication and drug development for the treatment of human diseases.
  • Keywords
    Bayes methods; DNA; biology computing; drugs; genetics; learning (artificial intelligence); proteins; DNA molecule; DNA replication protein; cell division; drug development; genome; machine learning; molecular mechanism; naive Bayes classifier; pseudo amino acid composition; reduced amino acid composition; sequence-derived feature; Accuracy; Amino acids; DNA; Diseases; Feature extraction; Proteins; Sensitivity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on
  • Conference_Location
    Halifax, NS
  • ISSN
    0840-7789
  • Print_ISBN
    978-1-4799-5827-6
  • Type

    conf

  • DOI
    10.1109/CCECE.2015.7129092
  • Filename
    7129092