Title :
Gene ontology prediction using compression based distances and alignment scores on both amino acid sequence and secondary structure
Author :
Filiz, Asli ; Çataltepe, Zehra
Author_Institution :
Bilgisayar Bilimleri Programi, Istanbul Teknik Univ., Istanbul
Abstract :
Normalized compression distance (NCD) is a compression based pairwise distance measure. NCD has been shown to perform well in different domains, such as music, biological sequence and text classification. In this study, we use NCD distance together with Smith-Waterman (SW) alignment scores of protein sequences for gene ontology prediction. We find out that, using secondary structure in addition to the amino acid sequence increases the prediction performance when using NCD or SW alignment scores alone. The best contribution ratio of secondary structure for SW alignment scores is 0.25, while it is 0.50 for NCD scores. We also investigate using both NCD and SW together with the amino acid and secondary structure. We find out that this combination results in better prediction than NCD alone, but worse prediction than SW alone.
Keywords :
biology computing; genetics; ontologies (artificial intelligence); proteins; Smith-Waterman alignment scores; amino acid sequence; gene ontology prediction; normalized compression distance; pairwise distance measure; protein sequence; secondary structure; Amino acids; Bioinformatics; Biological processes; Databases; Humans; Mice; Ontologies; Organisms; Proteins; Text categorization; Gene Ontology; Normalized Compression Distance; Smith-Waterman alignment score; amino acid sequence; secondary structure;
Conference_Titel :
Computer and Information Sciences, 2008. ISCIS '08. 23rd International Symposium on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-2880-9
Electronic_ISBN :
978-1-4244-2881-6
DOI :
10.1109/ISCIS.2008.4717967