• DocumentCode
    1907652
  • Title

    Gene ontology prediction using compression based distances and alignment scores on both amino acid sequence and secondary structure

  • Author

    Filiz, Asli ; Çataltepe, Zehra

  • Author_Institution
    Bilgisayar Bilimleri Programi, Istanbul Teknik Univ., Istanbul
  • fYear
    2008
  • fDate
    27-29 Oct. 2008
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Normalized compression distance (NCD) is a compression based pairwise distance measure. NCD has been shown to perform well in different domains, such as music, biological sequence and text classification. In this study, we use NCD distance together with Smith-Waterman (SW) alignment scores of protein sequences for gene ontology prediction. We find out that, using secondary structure in addition to the amino acid sequence increases the prediction performance when using NCD or SW alignment scores alone. The best contribution ratio of secondary structure for SW alignment scores is 0.25, while it is 0.50 for NCD scores. We also investigate using both NCD and SW together with the amino acid and secondary structure. We find out that this combination results in better prediction than NCD alone, but worse prediction than SW alone.
  • Keywords
    biology computing; genetics; ontologies (artificial intelligence); proteins; Smith-Waterman alignment scores; amino acid sequence; gene ontology prediction; normalized compression distance; pairwise distance measure; protein sequence; secondary structure; Amino acids; Bioinformatics; Biological processes; Databases; Humans; Mice; Ontologies; Organisms; Proteins; Text categorization; Gene Ontology; Normalized Compression Distance; Smith-Waterman alignment score; amino acid sequence; secondary structure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Sciences, 2008. ISCIS '08. 23rd International Symposium on
  • Conference_Location
    Istanbul
  • Print_ISBN
    978-1-4244-2880-9
  • Electronic_ISBN
    978-1-4244-2881-6
  • Type

    conf

  • DOI
    10.1109/ISCIS.2008.4717967
  • Filename
    4717967