DocumentCode
1907652
Title
Gene ontology prediction using compression based distances and alignment scores on both amino acid sequence and secondary structure
Author
Filiz, Asli ; Çataltepe, Zehra
Author_Institution
Bilgisayar Bilimleri Programi, Istanbul Teknik Univ., Istanbul
fYear
2008
fDate
27-29 Oct. 2008
Firstpage
1
Lastpage
6
Abstract
Normalized compression distance (NCD) is a compression based pairwise distance measure. NCD has been shown to perform well in different domains, such as music, biological sequence and text classification. In this study, we use NCD distance together with Smith-Waterman (SW) alignment scores of protein sequences for gene ontology prediction. We find out that, using secondary structure in addition to the amino acid sequence increases the prediction performance when using NCD or SW alignment scores alone. The best contribution ratio of secondary structure for SW alignment scores is 0.25, while it is 0.50 for NCD scores. We also investigate using both NCD and SW together with the amino acid and secondary structure. We find out that this combination results in better prediction than NCD alone, but worse prediction than SW alone.
Keywords
biology computing; genetics; ontologies (artificial intelligence); proteins; Smith-Waterman alignment scores; amino acid sequence; gene ontology prediction; normalized compression distance; pairwise distance measure; protein sequence; secondary structure; Amino acids; Bioinformatics; Biological processes; Databases; Humans; Mice; Ontologies; Organisms; Proteins; Text categorization; Gene Ontology; Normalized Compression Distance; Smith-Waterman alignment score; amino acid sequence; secondary structure;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Information Sciences, 2008. ISCIS '08. 23rd International Symposium on
Conference_Location
Istanbul
Print_ISBN
978-1-4244-2880-9
Electronic_ISBN
978-1-4244-2881-6
Type
conf
DOI
10.1109/ISCIS.2008.4717967
Filename
4717967
Link To Document