DocumentCode :
3321624
Title :
GenoMosaic: on-demand multiple genome comparison and comparative annotation
Author :
Gibas, Cynthia ; Sturgill, David ; Weller, Jennifer
Author_Institution :
Virginia Tech, Blacksburg, VA, USA
fYear :
2003
fDate :
10-12 March 2003
Firstpage :
158
Lastpage :
165
Abstract :
GenoMosaic is a portable database application for on demand multiple genome comparison. We discuss the methods used to generate a GenoMosaic data set from genome sequence data, and present the relational data model used in the application. We define an abstraction of genome sequence data (the feature mosaic) that allows us to bridge between annotation that describes features within single genes and that which includes possibly multiple genes and intergenic features over long stretches of genomic sequence. The goal of this project is to support new method development for on-demand multiple genome comparison. Each genome to be compared can be modeled as a string of generic features of any type that can be computationally defined, related by adjacency information within and among genomes. The generic feature abstraction makes it possible to study the arrangement of features in the genome at a level of detail which includes RNA genes, putative regulatory regions, SNPs, overlapping transcripts, intron splice junctions, alternative polyadenylation signals-in short, to incorporate significant sequence details which are not necessarily within protein-coding regions. This abstraction is amenable to functional implementation as a relational data model upon which novel query capabilities can be built, and provides objects that can be analyzed using algorithms for comparison of strings and lists. As an initial effort, we have implemented a prototype using a representative set of comparative and content-based annotation methods to reduce a collection of prokaryotic genomes to a feature mosaic representation. Entity-Relationship modeling was then used to develop a data model capable of storing detailed results, including complete parameters for each instance of analysis.
Keywords :
biology computing; database management systems; genetics; physiological models; proteins; Entity-Relationship modeling; GenoMosaic; RNA genes; alternative polyadenylation signals; comparative annotation; complete parameters; content-based annotation methods; data model; features arrangement; generic feature abstraction; generic features string; intergenic features; intron splice junctions; on-demand multiple genome comparison; overlapping transcripts; portable database application; protein-coding regions; putative regulatory regions; relational data model; Algorithm design and analysis; Bioinformatics; Bridges; Data models; Genomics; Proteins; Prototypes; RNA; Relational databases; Spatial databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on
Print_ISBN :
0-7695-1907-5
Type :
conf
DOI :
10.1109/BIBE.2003.1188942
Filename :
1188942
Link To Document :
بازگشت