• DocumentCode
    419338
  • Title

    Predicting gene ontology annotations from sequence data using kernel-based machine learning algorithms

  • Author

    Ward, J.J. ; Sodhi, J.S. ; Buxton, B.F. ; Jones, D.T.

  • Author_Institution
    Univ. Coll. London, UK
  • fYear
    2004
  • fDate
    16-19 Aug. 2004
  • Firstpage
    529
  • Lastpage
    530
  • Abstract
    In this early part of the post-genomic era, the inference of the functions associated with gene products is a necessary first step in understanding the development and maintenance of living cells. We describe the development of a machine learning method for predicting biological process as defined by the gene ontology (GO). The algorithm uses features that can be generated from amino acid sequence alone, and does not require further experimental studies such as microarrays, 2-hybrid screens or systematic ´pull-down´ assays. The budding yeast Saccharomyces cerevisiae is used because of its comprehensive set of functional annotations, but the approach is sufficiently general for application to other eukaryote genomes. The input data include phylogenetic profiles, which represent the distribution of orthologous proteins in the genomes of other organisms, position-specific scoring matrices, and secondary structure and dynamic disorder predictions. These are encoded using diffusion kernels, which are used to represent pair-wise relationships such as sequence or secondary structure element similarity between nodes (proteins) in a graph. These kernels are benchmarked on the process prediction problem using a maximal margin (SVM) learning algorithm.
  • Keywords
    biology computing; genetics; learning (artificial intelligence); molecular biophysics; proteins; support vector machines; 2-hybrid screens; Saccharomyces cerevisiae; amino acid sequence; biological process; diffusion kernels; dynamic disorder predictions; encoding; eukaryote genomes; gene ontology annotations; kernel-based machine learning algorithms; living cells; maximal margin learning algorithm; microarrays; orthologous proteins; position-specific scoring matrices; secondary structure; sequence data; support vector machine; systematic pull-down assays; yeast; Amino acids; Bioinformatics; Biological processes; Genomics; Inference algorithms; Kernel; Learning systems; Machine learning algorithms; Ontologies; Proteins;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
  • Print_ISBN
    0-7695-2194-0
  • Type

    conf

  • DOI
    10.1109/CSB.2004.1332485
  • Filename
    1332485