• DocumentCode
    2340211
  • Title

    Automated learning of genome sequences by computational intelligence

  • Author

    Yang, Mary Qu ; Yang, Jack Y. ; Zuojie Luo ; Ersoy, Okan K.

  • Author_Institution
    Purdue Electr. & Comput. Eng. Sch., Purdue Univ., West Lafayette, IN
  • fYear
    0
  • fDate
    0-0 0
  • Abstract
    Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins lack specific 3D structures and are unconsented during evolution, but play central roles in diseases characterized by protein misfolding and aggregation. Can we assign protein functions to sequences without relying on 3D structures, to provide useful information for the study of diseases? We developed machine learning techniques to rapidly assess protein functions from sequences. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions (due to complex interactions among proteins). It follows that the instances in the resulting classification problem can carry multiple class labels. We have developed a tree-based classifier that capable of classifying multiply-labeled data and gained an insight into the multi-functional nature of proteins. The algorithm has been used with ensemble methods in connection with other computational intelligence to form a committee machine. Results have been compared favorably to those achieved algorithms such as decision trees and support vector machines
  • Keywords
    DNA; biology computing; genetics; learning (artificial intelligence); pattern classification; proteins; trees (mathematics); DNA sequence data; automated learning; computational intelligence; disordered proteins; genome sequences; intrinsic unstructured proteins; machine learning; protein functions; protein interactions; tree-based classifier; Bioinformatics; Classification tree analysis; Computational intelligence; DNA; Decision trees; Diseases; Genomics; Machine learning; Protein engineering; Protein sequence; Protein function; classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence Methods and Applications, 2005 ICSC Congress on
  • Conference_Location
    Istanbul
  • Print_ISBN
    1-4244-0020-1
  • Type

    conf

  • DOI
    10.1109/CIMA.2005.1662321
  • Filename
    1662321