• DocumentCode
    3399050
  • Title

    Gene Ontology Automatic Annotation Using a Domain Based Gene Product Similarity Measure

  • Author

    Popescu, Mihail ; Keller, James M. ; Mitchell, Joyce A.

  • Author_Institution
    Dept. of Health Manage. & Informatics, Missouri Univ., Columbia, MO
  • fYear
    2005
  • fDate
    25-25 May 2005
  • Firstpage
    108
  • Lastpage
    113
  • Abstract
    Recent years have seen an explosive growth in the amount of biological data available for analysis. The large volume of data collected makes it necessary to automatically classify and sort such data on a very large scale. Typically, investigators use computational sequence analysis tools to assign functions to newly found gene products. The problem is to find the functions of a (unknown) gene product given its amino acid sequence. In this work we search for functional similarity between gene products by matching the functional domains that they contain. The domain-based approach addresses the main problem of sequence-based similarity, i.e., when the region of a gene product that is matched by a query sequence is not related to the function of that gene product. We use the hidden Markov representation of a gene product domain as described in the PFAM database, and then infer annotations that come from the Gene Ontology. To compute domain similarity between two gene products we introduce a fuzzy Jaccard similarity measure. We tested our domain-based similarity for the functional annotation of a set of 194 gene products extracted from the ENSEMBL Web site. We compared the domain similarity approach to the traditional way of performing functional annotation using a sequence-based similarity (BLAST and Smith-Waterman). The annotation was performed in all cases using a fuzzy K-nearest neighbor algorithm. We found that our domain-based annotation was better than the most common BLAST approach, but not as good as complex Smith-Waterman technique. The domain-based annotation has about 70% correct annotation rate at 17% false annotation rate
  • Keywords
    biology computing; data analysis; genetics; hidden Markov models; ontologies (artificial intelligence); sorting; BLAST; ENSEMBL Web site; Gene Ontology automatic annotation; Smith-Waterman technique; amino acid sequence; computational sequence analysis tool; domain based gene product similarity measure; domain similarity approach; domain-based annotation; functional similarity; fuzzy Jaccard similarity measure; fuzzy K-nearest neighbor algorithm; hidden Markov representation; query sequence; sequence-based similarity; Amino acids; Bioinformatics; Biology computing; Biomedical informatics; Databases; Electric variables measurement; Engineering management; Genomics; Ontologies; Proteins;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems, 2005. FUZZ '05. The 14th IEEE International Conference on
  • Conference_Location
    Reno, NV
  • Print_ISBN
    0-7803-9159-4
  • Type

    conf

  • DOI
    10.1109/FUZZY.2005.1452377
  • Filename
    1452377