DocumentCode :
952019
Title :
Solving the Problem of Trans-Genomic Query with Alignment Tables
Author :
Parker, Douglass Stott ; Hsiao, Ruey-Lung ; Xing, Yi ; Resch, Alissa M. ; Lee, Christopher J.
Author_Institution :
Dept. of Comput. Sci., Univ. of California at Los Angeles, Los Angeles, CA
Volume :
5
Issue :
3
fYear :
2008
Firstpage :
432
Lastpage :
447
Abstract :
The trans-genomic query (TGQ) problem-enabling the free query of biological information, even across genomes-is a central challenge facing bioinformatics. Solutions to this problem can alter the nature of the field, moving it beyond the jungle of data integration and expanding the number and scope of questions that can be answered. An alignment table is a binary relationship on locations (sequence segments). An important special case of alignment tables are hit tables-tables of pairs of highly similar segments produced by alignment tools like BLAST. However, alignment tables also include general binary relationships and can represent any useful connection between sequence locations. They can be curated and provide a high-quality queryable backbone of connections between biological information. Alignment tables thus can be a natural foundation for TGQ, as they permit a central part of the TGQ problem to be reduced to purely technical problems involving tables of locations. Key challenges in implementing alignment tables include efficient representation and indexing of sequence locations. We define a location data type that can be incorporated naturally into common off-the-shelf database systems. We also describe an implementation of alignment tables in BLASTGRES, an extension of the open-source POSTGRESQL database system that provides indexing and operators on locations required for querying alignment tables. This paper also reviews several successful large-scale applications of alignment tables for TGQ. Tables with millions of alignments have been used in queries about alternative splicing, an area of genomic analysis concerning the way in which a single gene can yield multiple transcripts. Comparative genomics is a large potential application area for TGQ and alignment tables.
Keywords :
data integrity; genetics; medical computing; medical information systems; query processing; very large databases; BLAST; BLASTGRES; TGQ; alignment tables; binary relationship; bioinformatics; biological information; common off-the-shelf database system; data integration; genomic analysis; hit tables-tables; open-source POSTGRESQL database system; trans-genomic query; Algorithms; Base Sequence; Chromosome Mapping; Conserved Sequence; Databases, Genetic; Molecular Sequence Data; Sequence Alignment; Sequence Analysis, DNA;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.1073
Filename :
4359868
Link To Document :
بازگشت